WO2021210492A1

WO2021210492A1 - Information processing device, information processing method, and program

Info

Publication number: WO2021210492A1
Application number: PCT/JP2021/014938
Authority: WO
Inventors: 貴之猿田; 亮水谷; 達雄古賀; 仁紀木内
Original assignee: Kajima Corp; Preferred Networks Inc
Current assignee: Kajima Corp; Preferred Networks Inc
Priority date: 2020-04-15
Filing date: 2021-04-08
Publication date: 2021-10-21
Anticipated expiration: 2022-10-15

Abstract

The information processing device according to an embodiment is provided with at least one memory and at least one processor. The at least one processor is configured to be capable of executing: acquiring a detection result and an environmental information, the detection result including at least one of the state of the surroundings of the information processing device or the state of the information processing device, the environmental information relating to the environment of the surroundings of the information processing device; and executing estimation of the self-location and generation of map information on the basis of the environmental information and the detection result.

Description

Information processing equipment, information processing methods, and programs

　本発明の実施形態は、情報処理装置、情報処理方法、およびプログラムに関する。 An embodiment of the present invention relates to an information processing device, an information processing method, and a program.

　従来、センサによるセンシング結果や撮像画像から周囲の物体の位置および形状を認識することによって、自己位置を推定すると共に地図情報を生成するロボット等が知られている。 Conventionally, robots and the like that estimate their own position and generate map information by recognizing the position and shape of surrounding objects from the sensing result by the sensor or the captured image are known.

特開２００３－０１５７３９号公報Japanese Unexamined Patent Publication No. 2003-015739

　しかしながら、従来技術においては、高精度に自己位置の推定および地図情報の生成をすることが困難な場合があった。 However, in the prior art, it may be difficult to estimate the self-position and generate map information with high accuracy.

　実施形態の情報処理装置は、少なくとも１つのメモリと、少なくとも１つのプロセッサと、を備える。少なくとも１つのプロセッサは、情報処理装置の周囲の状態または情報処理装置の状態のいずれか１つを含む検知結果と、情報処理装置の周囲の環境に関する環境情報と、を取得することと、環境情報と検知結果とに基づいて、自己位置の推定と地図情報の生成とを実行することと、を実行可能に構成される。 The information processing device of the embodiment includes at least one memory and at least one processor. At least one processor acquires a detection result including either the surrounding state of the information processing device or the state of the information processing device, and environmental information about the environment around the information processing device, and the environmental information. Based on the information processing and the detection result, it is possible to estimate the self-position and generate map information.

図１は、第１の実施形態に係る情報処理装置のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the hardware configuration of the information processing apparatus according to the first embodiment. 図２は、第１の実施形態に係る情報処理装置が備える機能の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of a function provided in the information processing apparatus according to the first embodiment. 図３は、第１の実施形態に係るトラッキング処理の一例を示すイメージ図である。FIG. 3 is an image diagram showing an example of tracking processing according to the first embodiment. 図４は、第１の実施形態に係る情報処理装置と周囲の物体との位置関係の一例を示すイメージ図である。FIG. 4 is an image diagram showing an example of the positional relationship between the information processing apparatus according to the first embodiment and surrounding objects. 図５は、第１の実施形態に係るバンドル調整の一例を示すイメージ図である。FIG. 5 is an image diagram showing an example of bundle adjustment according to the first embodiment. 図６は、第１の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the first embodiment. 図７は、第２の実施形態に係る情報処理装置が備える機能の一例を示すブロック図である。FIG. 7 is a block diagram showing an example of the functions provided in the information processing apparatus according to the second embodiment. 図８は、第２の実施形態に係る情報処理装置と周囲の物体との位置関係の一例を示すイメージ図である。FIG. 8 is an image diagram showing an example of the positional relationship between the information processing apparatus according to the second embodiment and surrounding objects. 図９は、第２の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。FIG. 9 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the second embodiment. 図１０は、第３の実施形態に係る情報処理装置が備える機能の一例を示すブロック図である。FIG. 10 is a block diagram showing an example of the functions provided in the information processing apparatus according to the third embodiment. 図１１は、第３の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。FIG. 11 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the third embodiment. 図１２は、第４の実施形態に係る撮像画像のセグメンテーションの一例を示す図である。FIG. 12 is a diagram showing an example of segmentation of a captured image according to a fourth embodiment. 図１３は、変形例２に係る地図情報の一例を示す図である。FIG. 13 is a diagram showing an example of map information according to the second modification.

（第１の実施形態）
　図１は、第１の実施形態に係る情報処理装置１のハードウェア構成の一例を示すブロック図である。情報処理装置１は、一例として、本体部１０と、移動装置１６と、撮像装置１７と、ＩＭＵ（Inertial　Measurement　Unit）センサ１８とを備える。 (First Embodiment)
FIG. 1 is a block diagram showing an example of the hardware configuration of the information processing apparatus 1 according to the first embodiment. As an example, the information processing device 1 includes a main body 10, a moving device 16, an imaging device 17, and an IMU (Inertial Measurement Unit) sensor 18.

　移動装置１６は、情報処理装置１を移動させることが可能な装置である。移動装置１６は、一例として、複数の車輪と、これらの車輪を駆動させるモータとを有し、本体部１０を支持するように本体部１０の下部に連結される。 The moving device 16 is a device capable of moving the information processing device 1. As an example, the moving device 16 has a plurality of wheels and a motor for driving these wheels, and is connected to the lower part of the main body 10 so as to support the main body 10.

　情報処理装置１は、移動装置１６によって、例えば建設中の建物、建設されたビル、駅のホーム、または工場などの中を移動可能であるものとする。本実施形態においては、情報処理装置１が、建設中の建物の中を移動する場合を例として説明する。 It is assumed that the information processing device 1 can be moved by the mobile device 16, for example, in a building under construction, a built building, a platform of a station, a factory, or the like. In the present embodiment, the case where the information processing device 1 moves in the building under construction will be described as an example.

　なお、情報処理装置１の移動手段は車輪に限定されるものではなく、キャタピラや、プロペラ等であってもよい。情報処理装置１は、例えば、ロボットや、ドローン等である。なお、本実施形態においては、情報処理装置１は、自律移動をするものとするが、これに限定されるものではない。 The means of transportation of the information processing device 1 is not limited to wheels, and may be caterpillars, propellers, or the like. The information processing device 1 is, for example, a robot, a drone, or the like. In the present embodiment, the information processing device 1 is supposed to move autonomously, but the information processing device 1 is not limited to this.

　撮像装置１７は、例えば、左右に並んだ２台のカメラが１セットになったステレオカメラである。撮像装置１７は、２台のカメラでそれぞれ撮像した撮像画像データを、対応付けて本体部１０に送出する。 The image pickup device 17 is, for example, a stereo camera in which two cameras arranged side by side are set as one set. The image pickup device 17 transmits the captured image data captured by the two cameras to the main body 10 in association with each other.

　ＩＭＵセンサ１８は、ジャイロセンサおよび加速度センサ等が統合されたセンサであり、情報処理装置１の角速度と加速度とを計測する。ＩＭＵセンサ１８は、計測した角速度と加速度とを本体部１０に送出する。なお、ＩＭＵセンサ１８は、ジャイロセンサと加速度センサだけではなく、磁気センサやＧＰＳ（Global　Positioning　System）装置等をさらに包含してもよい。 The IMU sensor 18 is a sensor in which a gyro sensor, an acceleration sensor, and the like are integrated, and measures the angular velocity and acceleration of the information processing device 1. The IMU sensor 18 sends the measured angular velocity and acceleration to the main body 10. The IMU sensor 18 may further include not only a gyro sensor and an acceleration sensor, but also a magnetic sensor, a GPS (Global Positioning System) device, and the like.

　本実施形態においては、撮像装置１７とＩＭＵセンサ１８とを総称して、検知部ともいう。なお、検知部はさらに各種のセンサを含むものとしてもよい。一例として、情報処理装置１は、超音波センサやレーザスキャナ等の測距センサをさらに備えてもよい。 In the present embodiment, the image pickup device 17 and the IMU sensor 18 are collectively referred to as a detection unit. The detection unit may further include various sensors. As an example, the information processing device 1 may further include a distance measuring sensor such as an ultrasonic sensor or a laser scanner.

　なお、本実施形態において「検知」という場合は、情報処理装置１の周囲を撮像することや、情報処理装置１の角速度または加速度を計測すること、情報処理装置１の周囲の物体との距離を測距すること等を含むものとする。また、本実施形態においては、検知部による検知結果は、少なくとも、情報処理装置１の周囲の状態、または情報処理装置１の状態のいずれか１つを含む。換言すれば、検知結果は、情報処理装置１の周囲の状態と情報処理装置１の状態に関する情報の両方を含むものでもよいし、情報処理装置１の周囲の状態と情報処理装置１の状態に関する情報のいずれか一方のみを含むものでもよい。 In the present embodiment, the term "detection" refers to imaging the surroundings of the information processing device 1, measuring the angular velocity or acceleration of the information processing device 1, and the distance to an object around the information processing device 1. It shall include measuring the distance. Further, in the present embodiment, the detection result by the detection unit includes at least one of the surrounding state of the information processing device 1 and the state of the information processing device 1. In other words, the detection result may include both information about the surrounding state of the information processing device 1 and the state of the information processing device 1, or may relate to the surrounding state of the information processing device 1 and the state of the information processing device 1. It may contain only one of the information.

　情報処理装置１の周囲の状態は、例えば、情報処理装置１の周囲を撮像した撮像画像、および情報処理装置１の周囲の物体と情報処理装置１の距離の測距結果等である。また、情報処理装置１の状態は、例えば、ＩＭＵセンサ１８によって計測された角速度および加速度である。例えば、撮像装置１７によって撮像された撮像画像は、情報処理装置１の周囲の状態の検知結果の一例である。本実施形態においては、検知結果は少なくとも撮像画像を含むものとするが、さらに他の情報を含むものとしてもよい。 The surrounding state of the information processing device 1 is, for example, an captured image of the surroundings of the information processing device 1, a distance measurement result of a distance between an object around the information processing device 1 and the information processing device 1. The state of the information processing device 1 is, for example, the angular velocity and acceleration measured by the IMU sensor 18. For example, the captured image captured by the imaging device 17 is an example of the detection result of the surrounding state of the information processing device 1. In the present embodiment, the detection result includes at least the captured image, but may further include other information.

　本体部１０は、一例として、プロセッサ１１と、主記憶装置１２（メモリ）と、補助記憶装置１４（メモリ）と、ネットワークインタフェース１３と、デバイスインタフェース１５と、を備え、これらがバス１９を介して接続されたコンピュータとして実現されてもよい。なお、撮像装置１７およびＩＭＵセンサ１８が本体部１０に内蔵される構成を採用してもよい。 As an example, the main body 10 includes a processor 11, a main storage device 12 (memory), an auxiliary storage device 14 (memory), a network interface 13, and a device interface 15, which are routed via a bus 19. It may be realized as a connected computer. The image pickup device 17 and the IMU sensor 18 may be incorporated in the main body 10.

　プロセッサ１１は、コンピュータの制御装置及び演算装置を含む電子回路（処理回路、Processing　circuit、Processing　circuitry、ＣＰＵ（Central　Processing　Unit）、ＧＰＵ（Graphics　Processing　Unit）、ＦＰＧＡ（Field　Programmable　Gate　Array）、又はＡＳＩＣ（Application　Specific　Integrated　Circuit）等）であってもよい。また、プロセッサ１１は、専用の処理回路を含む半導体装置等であってもよい。プロセッサ１１は、電子論理素子を用いた電子回路に限定されるものではなく、光論理素子を用いた光回路により実現されてもよい。また、プロセッサ１１は、量子コンピューティングに基づく演算機能を含むものであってもよい。 The processor 11 is an electronic circuit (processing circuit, Processing circuit, Processing circuitry, CPU (Central Processing Unit), GPU (Graphics Processing Unit), FPGA (Field Programmable Gate Array), or ASIC (Processing circuit, Processing circuit, Processing circuitry, CPU (Central Processing Unit)) including a computer control device and a computing device. Application Specific Integrated Circuit), etc.) may be used. Further, the processor 11 may be a semiconductor device or the like including a dedicated processing circuit. The processor 11 is not limited to an electronic circuit using an electronic logic element, and may be realized by an optical circuit using an optical logic element. Further, the processor 11 may include a calculation function based on quantum computing.

　プロセッサ１１は、情報処理装置１の内部構成の各装置等から入力されたデータやソフトウェア（プログラム）に基づいて演算処理を行い、演算結果や制御信号を各装置等に出力することができる。プロセッサ１１は、情報処理装置１のＯＳ（Operating　System）や、アプリケーション等を実行することにより、情報処理装置１を構成する各構成要素を制御してもよい。 The processor 11 can perform arithmetic processing based on the data and software (program) input from each apparatus and the like of the internal configuration of the information processing apparatus 1, and output the arithmetic result and the control signal to each apparatus and the like. The processor 11 may control each component constituting the information processing device 1 by executing an OS (Operating System) of the information processing device 1, an application, or the like.

　主記憶装置１２は、プロセッサ１１が実行する命令及び各種データ等を記憶する記憶装置であり、主記憶装置１２に記憶された情報がプロセッサ１１により読み出される。補助記憶装置１４は、主記憶装置１２以外の記憶装置である。なお、これらの記憶装置は、電子情報を格納可能な任意の電子部品を意味するものとし、半導体のメモリでもよい。半導体のメモリは、揮発性メモリ、不揮発性メモリのいずれでもよい。本実施形態における情報処理装置１において各種データを保存するための記憶装置は、主記憶装置１２又は補助記憶装置１４により実現されてもよく、プロセッサ１１に内蔵される内蔵メモリにより実現されてもよい。なお、主記憶装置１２または補助記憶装置１４を、記憶部ともいう。 The main storage device 12 is a storage device that stores instructions executed by the processor 11, various data, and the like, and the information stored in the main storage device 12 is read out by the processor 11. The auxiliary storage device 14 is a storage device other than the main storage device 12. Note that these storage devices mean arbitrary electronic components capable of storing electronic information, and may be semiconductor memories. The semiconductor memory may be either a volatile memory or a non-volatile memory. The storage device for storing various data in the information processing device 1 in the present embodiment may be realized by the main storage device 12 or the auxiliary storage device 14, or may be realized by the built-in memory built in the processor 11. .. The main storage device 12 or the auxiliary storage device 14 is also referred to as a storage unit.

　記憶装置（メモリ）１つに対して、複数のプロセッサが接続（結合）されてもよいし、単数のプロセッサが接続されてもよい。プロセッサ１つに対して、複数の記憶装置（メモリ）が接続（結合）されてもよい。本実施形態における情報処理装置１が、少なくとも１つの記憶装置（メモリ）とこの少なくとも１つの記憶装置（メモリ）に接続（結合）される複数のプロセッサで構成される場合、複数のプロセッサのうち少なくとも１つのプロセッサが、少なくとも１つの記憶装置（メモリ）に接続（結合）される構成を含んでもよい。また、複数台のコンピュータに含まれる記憶装置（メモリ）とプロセッサによって、この構成が実現されてもよい。さらに、記憶装置（メモリ）がプロセッサと一体になっている構成（例えば、Ｌ１キャッシュ、Ｌ２キャッシュを含むキャッシュメモリ）を含んでもよい。 Multiple processors may be connected (combined) to one storage device (memory), or a single processor may be connected. A plurality of storage devices (memory) may be connected (combined) to one processor. When the information processing device 1 in the present embodiment is composed of at least one storage device (memory) and a plurality of processors connected (combined) to the at least one storage device (memory), at least one of the plurality of processors One processor may include a configuration in which it is connected (combined) to at least one storage device (memory). Further, this configuration may be realized by a storage device (memory) and a processor included in a plurality of computers. Further, a configuration in which the storage device (memory) is integrated with the processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.

　ネットワークインタフェース１３は、無線又は有線により、通信ネットワーク３に接続するためのインタフェースである。ネットワークインタフェース１３は、既存の通信規格に適合したもの等、適切なインタフェースを用いればよい。ネットワークインタフェース１３により、通信ネットワーク３を介して接続された外部装置２と情報のやり取りが行われてもよい。なお、通信ネットワーク３は、ＷＡＮ（Wide　Area　Network）、ＬＡＮ（Local　Area　Network）、ＰＡＮ（Personal　Area　Network）等の何れか、又は、それらの組み合わせであってよく、情報処理装置１と外部装置２との間で情報のやり取りが行われるものであればよい。ＷＡＮの一例としてインターネット等があり、ＬＡＮの一例としてＩＥＥＥ８０２．１１やイーサネット（登録商標）等があり、ＰＡＮの一例としてＢｌｕｅｔｏｏｔｈ（登録商標）やＮＦＣ（Near　Field　Communication）等がある。 The network interface 13 is an interface for connecting to the communication network 3 wirelessly or by wire. As the network interface 13, an appropriate interface such as one conforming to an existing communication standard may be used. Information may be exchanged with the external device 2 connected via the communication network 3 by the network interface 13. The communication network 3 may be any one of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), or a combination thereof, and the information processing device 1 and the external device 2 may be used. It suffices as long as information is exchanged with. An example of WAN is the Internet, an example of LAN is IEEE802.11, Ethernet (registered trademark), etc., and an example of PAN is Bluetooth (registered trademark), NFC (Near Field Communication), etc.

　デバイスインタフェース１５は、移動装置１６、撮像装置１７、およびＩＭＵセンサ１８と直接接続するインタフェースである。デバイスインタフェース１５は、例えばＵＳＢ（Universal　Serial　Bus）等の標準規格に準拠するインタフェースとするが、これに限定されるものではない。また、デバイスインタフェース１５は、図１に示した各種装置以外の外部装置とさらに接続するものとしてもよい。 The device interface 15 is an interface that directly connects to the mobile device 16, the image pickup device 17, and the IMU sensor 18. The device interface 15 is an interface that conforms to a standard such as USB (Universal Serial Bus), but is not limited thereto. Further, the device interface 15 may be further connected to an external device other than the various devices shown in FIG.

　外部装置２は、例えばサーバ装置等である。外部装置２は、情報処理装置１と通信ネットワーク３を介して接続されている。 The external device 2 is, for example, a server device or the like. The external device 2 is connected to the information processing device 1 via a communication network 3.

　本実施形態の外部装置２は、建物の３次元設計情報を予め記憶している。建物の３次元設計情報は、例えば、ＢＩＭ（Building　Information　Modeling）情報である。ＢＩＭ情報には、建物の３次元構造の情報と、建材の素材等の情報とが含まれる。なお、建物の３次元設計情報はＢＩＭ情報に限定されるものではなく、３Ｄ　ＣＡＤ（Ｃomputer-Ａided　Ｄesign）データ等であってもよい。 The external device 2 of this embodiment stores the three-dimensional design information of the building in advance. The three-dimensional design information of the building is, for example, BIM (Building Information Modeling) information. BIM information includes information on the three-dimensional structure of a building and information on materials such as building materials. The three-dimensional design information of the building is not limited to BIM information, and may be 3D CAD (Computer-Aided Design) data or the like.

　３次元設計情報は、本実施形態における環境情報の一例である。環境情報は、情報処理装置１の周囲の環境に関する情報である。例えば、環境情報は、少なくとも、情報処理装置１が走行する建物に関する情報、情報処理装置１の周囲に存在する人物または物体に関する情報、天気に関する情報、照明に関する情報のいずれか１つを含む。上述の３次元設計情報は、より詳細には、環境情報のうち、情報処理装置１が走行する建物に関する情報の一例である。なお、環境情報は、複数の種類の情報の組み合わせでも良いし、いずれか１種類の情報のみを含むものでも良い。例えば、本実施形態においては、環境情報は、少なくとも３次元設計情報を含むものとするが、さらに、情報処理装置１の周囲の環境に関する他の情報を含むものであってもよい。 The three-dimensional design information is an example of environmental information in this embodiment. The environmental information is information about the environment around the information processing device 1. For example, the environmental information includes at least one of information about a building in which the information processing device 1 travels, information about a person or an object existing around the information processing device 1, information about the weather, and information about lighting. More specifically, the above-mentioned three-dimensional design information is an example of information on the building in which the information processing device 1 travels among the environmental information. The environmental information may be a combination of a plurality of types of information, or may include only one type of information. For example, in the present embodiment, the environmental information includes at least three-dimensional design information, but may further include other information regarding the environment around the information processing device 1.

　なお、本実施形態において「物体」という場合は、壁や柱等の構造物、什器、家具、移動体、仮設物、および人物等を含むものとする。 Note that the term "object" in this embodiment includes structures such as walls and pillars, furniture, furniture, moving objects, temporary objects, people, and the like.

　なお、本実施形態においては、情報処理装置１と外部装置２とは無線接続するものとするが、情報処理装置１と外部装置２とが有線接続してもよい。また、情報処理装置１は、外部装置２と常時接続していなくともよい。 In the present embodiment, the information processing device 1 and the external device 2 are wirelessly connected, but the information processing device 1 and the external device 2 may be connected by wire. Further, the information processing device 1 does not have to be always connected to the external device 2.

　次に、情報処理装置１が有する機能について説明する。図２は、第１の実施形態に係る情報処理装置１が備える機能の一例を示すブロック図である。 Next, the functions of the information processing device 1 will be described. FIG. 2 is a block diagram showing an example of the functions included in the information processing apparatus 1 according to the first embodiment.

　図２に示すように、情報処理装置１は、取得部１０１と、変換部１０２と、ＳＬＡＭ（Simultaneous　Localization　and　Mapping、またはSimultaneously　Localization　and　Mapping）処理部１２０と、移動制御部１０５とを備える。また、ＳＬＡＭ処理部１２０は、トラッキング部１０３と、バンドル調整（Bundle　Adjustment）部１０４とを含む。 As shown in FIG. 2, the information processing device 1 includes an acquisition unit 101, a conversion unit 102, a SLAM (Simultaneous Localization and Mapping, or Simultaneously Localization and Mapping) processing unit 120, and a movement control unit 105. Further, the SLAM processing unit 120 includes a tracking unit 103 and a bundle adjustment unit 104.

　取得部１０１は、情報処理装置１の周囲の状態または情報処理装置１の状態の検知結果と、情報処理装置１の周囲の環境に関する環境情報と、を取得する。 The acquisition unit 101 acquires the detection result of the surrounding state of the information processing device 1 or the state of the information processing device 1 and the environmental information regarding the environment around the information processing device 1.

　より詳細には、取得部１０１は、例えば、ネットワークインタフェース１３を介して、外部装置２から、ＢＩＭ情報を取得する。取得部１０１は、取得したＢＩＭ情報を補助記憶装置１４に保存する。 More specifically, the acquisition unit 101 acquires BIM information from the external device 2 via, for example, the network interface 13. The acquisition unit 101 stores the acquired BIM information in the auxiliary storage device 14.

　また、取得部１０１は、デバイスインタフェース１５を介して、撮像装置１７から撮像画像を取得する。また、取得部１０１は、デバイスインタフェース１５を介して、ＩＭＵセンサ１８から、角速度および加速度を取得する。 Further, the acquisition unit 101 acquires an captured image from the imaging device 17 via the device interface 15. In addition, the acquisition unit 101 acquires the angular velocity and acceleration from the IMU sensor 18 via the device interface 15.

　変換部１０２は、環境情報を、少なくとも後述のＳＬＡＭ処理部１２０による自己位置の推定処理、または地図情報の生成処理のいずれかの入力値に変換する。変換部１０２は、環境情報を、自己位置の推定処理と地図情報の生成処理の両方の入力値に変換しても良いし、いずれか一方の処理の入力値にのみ変換しても良い。なお、本実施形態において、「変換」という場合は、環境情報から他の情報を生成すること、または環境情報から情報を抽出、取得又は検索することを含む。 The conversion unit 102 converts the environmental information into an input value of at least one of the self-position estimation process by the SLAM processing unit 120 described later or the map information generation process. The conversion unit 102 may convert the environmental information into the input values of both the self-position estimation process and the map information generation process, or may convert only the input values of either process. In the present embodiment, the term "conversion" includes generating other information from the environmental information or extracting, acquiring or searching the information from the environmental information.

　本実施形態においては、変換部１０２は、ＢＩＭ情報から、後述のバンドル調整部１０４によるバンドル調整処理で使用される、３次元空間上の点の３次元座標（世界座標）の初期値を生成する。 In the present embodiment, the conversion unit 102 generates the initial value of the three-dimensional coordinates (world coordinates) of the points in the three-dimensional space used in the bundle adjustment process by the bundle adjustment unit 104 described later from the BIM information. ..

　例えば、変換部１０２は、後述のトラッキング部１０３によって特定された撮像装置１７の現在の位置および姿勢に基づいて、ＢＩＭ情報に含まれる壁や柱等の構造物のうち、撮像装置１７の撮像範囲に含まれる構造物を特定する。そして、変換部１０２は、撮像装置１７から撮像装置１７の撮像範囲に含まれる構造物の３次元座標（世界座標）を、ＢＩＭ情報から特定する。一例として、変換部１０２は、ＢＩＭ情報によって表される建物のうち、いずれか１点の世界座標を、外部装置等から取得し、該１点を基準として、ＢＩＭ情報によって表される建物に含まれる各点のＢＩＭ情報における３次元座標を、世界座標に変換する。なお、ＢＩＭ情報から、建物に含まれる各点の世界座標を求める手法はこれに限定されるものではない。変換部１０２は、特定した３次元座標（世界座標）を、後述のバンドル調整処理における３次元空間上の点の３次元座標（世界座標）の初期値として、バンドル調整部１０４に送出する。バンドル調整処理の詳細については、後述する。変換部１０２がＢＩＭ情報から特定した建物９に含まれる点の３次元座標（世界座標）は、本実施形態における周囲の物体の位置に関する情報の一例である。以下、特に限定しない限り、本実施形態における３次元座標は世界座標とする。 For example, the conversion unit 102 has an imaging range of the imaging device 17 among the structures such as walls and columns included in the BIM information based on the current position and orientation of the imaging device 17 specified by the tracking unit 103 described later. Identify the structures contained in. Then, the conversion unit 102 specifies the three-dimensional coordinates (world coordinates) of the structure included in the imaging range of the imaging device 17 from the imaging device 17 from the BIM information. As an example, the conversion unit 102 acquires the world coordinates of any one of the buildings represented by the BIM information from an external device or the like, and includes the building represented by the BIM information with the one point as a reference. The three-dimensional coordinates in the BIM information of each point are converted into world coordinates. The method of obtaining the world coordinates of each point included in the building from the BIM information is not limited to this. The conversion unit 102 sends the specified three-dimensional coordinates (world coordinates) to the bundle adjustment unit 104 as the initial value of the three-dimensional coordinates (world coordinates) of the points in the three-dimensional space in the bundle adjustment process described later. The details of the bundle adjustment process will be described later. The three-dimensional coordinates (world coordinates) of the points included in the building 9 specified by the conversion unit 102 from the BIM information are examples of information regarding the positions of surrounding objects in the present embodiment. Hereinafter, unless otherwise specified, the three-dimensional coordinates in the present embodiment are world coordinates.

　なお、変換部１０２は、初期値を一意の値として特定せずに、初期値の範囲を特定してもよい。例えば、変換部１０２は、変換部１０２は、３次元空間上の点の３次元座標を、一意の座標として特定するのではなく、範囲を持たせても良い。この場合、撮像装置１７の撮像範囲に含まれる構造物に含まれるある１点が含まれる可能性が高い３次元空間領域を、初期値の範囲としてバンドル調整部１０４に送出する。初期値または初期値の範囲は、本実施形態において変換部１０２が生成する入力値の一例である。 Note that the conversion unit 102 may specify the range of the initial value without specifying the initial value as a unique value. For example, the conversion unit 102 may provide a range instead of specifying the three-dimensional coordinates of a point in the three-dimensional space as unique coordinates. In this case, a three-dimensional space region that is likely to include a certain point included in the structure included in the imaging range of the imaging device 17 is sent to the bundle adjusting unit 104 as a range of initial values. The initial value or the range of the initial value is an example of the input value generated by the conversion unit 102 in the present embodiment.

　なお、本実施形態においては、ＢＩＭ情報における３次元座標系と、ＳＬＡＭ処理部１２０における３次元座標系（ＳＬＡＭ座標系）は予めキャリブレーションされているものとする。本実施形態において、キャリブレーションとは、ＢＩＭ情報における位置と、ＳＬＡＭ座標系における位置との対応関係が定義されていることをいう。例えば、情報処理装置１の移動開始地点のＢＩＭ情報における３次元座標系での位置が基準点として補助記憶装置１４に保存されていてもよい。このため、変換部１０２は、トラッキング部１０３によって特定された撮像装置１７の位置に対応する、ＢＩＭ情報に表される建物内の位置を特定することができる。 In the present embodiment, it is assumed that the three-dimensional coordinate system in the BIM information and the three-dimensional coordinate system (SLAM coordinate system) in the SLAM processing unit 120 are calibrated in advance. In the present embodiment, the calibration means that the correspondence between the position in the BIM information and the position in the SLAM coordinate system is defined. For example, the position of the movement start point of the information processing device 1 in the BIM information in the three-dimensional coordinate system may be stored in the auxiliary storage device 14 as a reference point. Therefore, the conversion unit 102 can specify the position in the building represented by the BIM information corresponding to the position of the image pickup device 17 specified by the tracking unit 103.

　ＢＩＭ情報における３次元座標系と、ＳＬＡＭ座標系とのキャリブレーションは、管理者等の入力操作によって実行されてもよいし、建物に設置されたＡＲ（Augmented　Reality）マーカ等の指標をＳＬＡＭ処理部１２０が撮像画像から認識することにより実行されてもよい。 The calibration of the 3D coordinate system and the SLAM coordinate system in the BIM information may be executed by an input operation by an administrator or the like, or an index such as an AR (Augmented Reality) marker installed in the building may be used as an index of the SLAM processing unit. It may be executed by recognizing 120 from the captured image.

　ＳＬＡＭ処理部１２０は、自己位置推定と地図情報の生成とを同時に行う。本実施形態において、自己位置とは、情報処理装置１の位置および姿勢である。また、本実施形態においては、撮像装置１７は情報処理装置１に搭載されているため、撮像装置１７の位置および姿勢は、情報処理装置１の位置および姿勢と同じものを表すものとする。撮像装置１７が情報処理装置１の中心から離れた位置に設置されている場合には、ＳＬＡＭ処理部１２０は、撮像装置１７と情報処理装置１の中心との位置のずれを補正して、情報処理装置１の位置である自己位置を推定する。 The SLAM processing unit 120 simultaneously estimates the self-position and generates map information. In the present embodiment, the self-position is the position and orientation of the information processing device 1. Further, in the present embodiment, since the image pickup device 17 is mounted on the information processing device 1, the position and orientation of the image pickup device 17 represent the same as the position and posture of the information processing device 1. When the image pickup apparatus 17 is installed at a position away from the center of the information processing apparatus 1, the SLAM processing unit 120 corrects the displacement between the image pickup apparatus 17 and the center of the information processing apparatus 1 to provide information. The self-position, which is the position of the processing device 1, is estimated.

　また、地図情報は、情報処理装置１の移動軌跡に沿って、周囲の構造物の形状を表したものである。より詳細には、本実施形態の地図情報は、情報処理装置１が走行する建物の内部の構造を、情報処理装置１の移動軌跡に沿って３次元で表す。本実施形態の地図情報は、例えば、情報処理装置１が走行する建物の内部の構造が３次元座標をもつ点群として表される点群地図とする。地図情報の種類はこれに限定されるものではなく、点群の代わりに３次元図形の集合によって地図が表されてもよい。また、地図情報は、環境マップともいう。 Further, the map information represents the shape of the surrounding structure along the movement locus of the information processing device 1. More specifically, the map information of the present embodiment represents the internal structure of the building in which the information processing device 1 travels in three dimensions along the movement locus of the information processing device 1. The map information of the present embodiment is, for example, a point cloud map in which the internal structure of the building in which the information processing device 1 travels is represented as a point cloud having three-dimensional coordinates. The type of map information is not limited to this, and the map may be represented by a set of three-dimensional figures instead of a point cloud. The map information is also called an environment map.

　ＳＬＡＭ処理部１２０は、本実施形態における推定部の一例である。なお、自己位置の推定と地図情報の生成の手法としては、ＳＬＡＭ以外の手法が採用されてもよい。また、自己位置の推定と地図情報の生成とは同時に行われなくともよく、一方の処理が先に完了した後に、もう一方の処理が実行されてもよい。また、本明細書において地図情報を生成することには、少なくとも、地図情報を新規に生成すること、生成した地図情報を調整すること、又は、生成した地図情報を更新することのいずれかが含まれるものとする。 The SLAM processing unit 120 is an example of an estimation unit in this embodiment. As a method for estimating the self-position and generating map information, a method other than SLAM may be adopted. Further, the estimation of the self-position and the generation of the map information do not have to be performed at the same time, and the other process may be executed after one process is completed first. In addition, the generation of map information in the present specification includes at least one of newly generating map information, adjusting the generated map information, or updating the generated map information. It shall be.

　ＳＬＡＭ処理部１２０は、トラッキング部１０３と、バンドル調整部１０４とを含む。 The SLAM processing unit 120 includes a tracking unit 103 and a bundle adjustment unit 104.

　トラッキング部１０３は、撮像装置１７によって異なる時刻に撮像された複数の撮像画像を追跡することによって、撮像装置１７の位置および姿勢を特定する。トラッキング部１０３は、本実施形態における特定部の一例である。 The tracking unit 103 identifies the position and orientation of the image pickup device 17 by tracking a plurality of captured images captured by the image pickup device 17 at different times. The tracking unit 103 is an example of a specific unit in the present embodiment.

　例えば、本実施形態においては、撮像装置１７は、情報処理装置１の移動に伴って、移動しながら周囲を撮像している。トラッキング部１０３は、ある撮像画像に描出された点を、異なる時刻に撮像された他の撮像画像上で追跡することにより、撮像装置１７の位置および姿勢の変化を算出する。トラッキング部１０３は、撮像開始時における撮像装置１７の位置および姿勢に、トラッキング処理によって特定した位置および姿勢の変化を追加することにより、撮像装置１７の現在の位置および姿勢を特定する。 For example, in the present embodiment, the image pickup device 17 captures the surroundings while moving as the information processing device 1 moves. The tracking unit 103 calculates changes in the position and posture of the image pickup device 17 by tracking points drawn on a certain captured image on another captured image captured at different times. The tracking unit 103 specifies the current position and orientation of the imaging device 17 by adding changes in the position and orientation specified by the tracking process to the position and orientation of the imaging device 17 at the start of imaging.

　図３は、第１の実施形態に係るトラッキング処理の一例を示すイメージ図である。参照フレーム４１と、対象フレーム４２とは、撮像装置１７によって異なる時刻に撮像された撮像画像である。参照フレーム４１は、対象フレーム４２よりも前に撮像された撮像画像であり、撮像装置１７は、参照フレーム４１を撮像した時点における位置Ｔ_ｉから、対象フレーム４２を撮像した時点における位置Ｔ_ｊに移動したものとする。参照フレーム４１をキーフレーム、対象フレーム４２を現在のフレームともいう。 FIG. 3 is an image diagram showing an example of tracking processing according to the first embodiment. The reference frame 41 and the target frame 42 are captured images captured at different times by the imaging device 17. The reference frame 41 is an captured image captured before the target frame 42, and the imaging device 17 _changes _{from the position Ti at the time when the reference frame 41 is imaged to the position T j} at the time when the target frame 42 is imaged. It is assumed that it has moved. The reference frame 41 is also referred to as a key frame, and the target frame 42 is also referred to as a current frame.

　この場合、トラッキング部１０３は、参照フレーム４１に描出された点Ｐが、対象フレーム４２に描出された場合における測光誤差を算出することにより、位置Ｔ_ｉから位置Ｔ_ｊへ撮像装置１７が移動した場合における相対的な移動量を算出する。点Ｐは、例えば、参照フレーム４１上の特徴点である。撮像装置１７の移動とは、撮像装置１７の位置の変化と、姿勢（向き）の変化との両方を含むものとする。 In this case, the tracking unit 103, P point which is depicted in the reference frame 41, by calculating the photometry error in the case where it is depicted in the target frame 42, the image pickup device 17 is moved from the position T _i to the position T _j Calculate the relative amount of movement in the case. The point P is, for example, a feature point on the reference frame 41. The movement of the image pickup apparatus 17 includes both a change in the position of the image pickup apparatus 17 and a change in the posture (orientation).

　また、参照フレーム４１を撮像した時点における位置Ｔ_ｉは、既に誤差補正済みであるものとする。図３に示す点５０ａは、参照フレーム４１に描出された点Ｐが３次元空間上に逆投影された位置を表す。 The position T _i at the time of the reference frame 41 and the image pickup is assumed to be already error-corrected. The point 50a shown in FIG. 3 represents the position where the point P drawn on the reference frame 41 is back-projected on the three-dimensional space.

　トラッキング部１０３は、一例として、以下の（１）式を用いて、参照フレーム４１と対象フレーム４２との間の測光誤差Ｅ_ｐｊを算出する。 As an example, the tracking unit 103 calculates _{the photometric error Epj} between the reference frame 41 and the target frame 42 by using the following equation (1).

　Ｉ_ｉは参照フレーム４１を表し、Ｉ_ｊは対象フレーム４２を表す。また、Ｎ_ｐは、参照フレーム４１上の点Ｐを含むピクセルの近傍パターンである。また、ｔ_ｉは参照フレーム４１の露光時間、ｔ_ｊは対象フレーム４２の露光時間を表す。また、ｐ´は、逆深度ｄ_ｐによる、対象フレーム４２におけるＰの投影点である。また、（１）式に示すように、トラッキング部１０３は、Ｈｕｂｅｒノルム（ｎｏｒｍ）を用いて、測光誤差Ｅ_ｐｊを算出している。また、重み係数Ｗ_ｐは、画素の輝度勾配によって事前に算出される。例えば、勾配が大きいピクセルに関しては重み係数Ｗ_ｐの値を小さくすることにより、ノイズを低減させることができる。重み係数Ｗ_ｐの算出の手法は、公知の手法を適用することができる。また、輝度変換用ハイパーパラメータａ_ｉ，ａ_ｊ，ｂ_ｉ，ｂ_ｊは、参照フレーム４１と対象フレーム４２との輝度を変換するパラメータである。輝度変換用ハイパーパラメータａ_ｉ，ａ_ｊ，ｂ_ｉ，ｂ_ｊは、例えば管理者によって手動でチューニングされても良い。 I _i represents the reference frame 41 and I _j represents the target frame 42. Further, N _p is a neighborhood pattern of pixels including the point P on the reference frame 41. Also, _{t i} is the exposure time of the reference frame 41, _{t j} represents the exposure time of the target frame 42. Further, p'is due to the inverse depth _{d p,} a projected point P in the target frame 42. Further, as shown in the equation (1), the tracking unit 103 calculates the _{photometric error Epj using the Huber norm.} Further, the weighting coefficient W _p is calculated in advance based on the brightness gradient of the pixels. For example, with respect to the gradient is larger pixel by reducing the value of the weight factor W _p, it is possible to reduce noise. As a method for calculating the weighting coefficient W _p , a known method can be applied. The luminance conversion hyperparameter _{_{_{a i, a j, b i}}} , b j is a parameter for converting the luminance of the reference frame 41 and the subject frame 42. Luminance conversion hyperparameter _{_{_{a i, a j, b i}}} , b j may be tuned manually for example by the administrator.

　また、以下の（２）式は、（１）式で用いられている点Ｐの投影点である点Ｐ´の制約条件である。点Ｐ´の算出には、参照フレーム４１に描出された点Ｐを３次元空間上に点５０ａとして逆投影する逆投影関数と、３次元空間上の点５０ａを対象フレーム４２に投影する投影関数とが用いられる。点Ｐから点５０ａまでの距離が、参照フレーム４１における点５０ａの深度（ｄ_ｐ）である。 Further, the following equation (2) is a constraint condition of the point P'which is the projection point of the point P used in the equation (1). To calculate the point P', a back projection function that back-projects the point P drawn on the reference frame 41 as a point 50a on the three-dimensional space and a projection function that projects the point 50a on the three-dimensional space onto the target frame 42. And are used. The distance from the point P to the point 50a is the depth (d _p ) of the point 50a in the reference frame 41.

　また、（２）式に含まれる係数Ｒは、撮像装置１７の回転量を表す。また、係数ｔは、撮像装置１７の並進量を表す。係数Ｒおよび係数ｔは、以下の制約条件（３）式によって、撮像装置１７の相対位置によって定義される。 Further, the coefficient R included in the equation (2) represents the amount of rotation of the image pickup apparatus 17. The coefficient t represents the translational amount of the imaging device 17. The coefficient R and the coefficient t are defined by the relative position of the image pickup apparatus 17 according to the following constraint condition (3).

　トラッキング部１０３は、上述の（１）式～（３）式に示す参照フレームＩ_ｉと対象フレームＩ_ｊ間の測光誤差Ｅ_ｐｊのモデルを解くことにより、対象フレームＩ_ｊを撮像した時点における撮像装置１７の位置Ｔ_ｊを特定する。なお、（３）式および図３に示す位置Ｔ_ｉおよび位置Ｔ_ｊは、撮像装置１７の位置及び向きを含む。このように、トラッキング部１０３は、撮像装置１７によって時系列に撮像される複数の撮像画像に対して、このようなトラッキング処理を繰り返し実行することにより、撮像装置１７の位置および姿勢の変化を追跡する。 The tracking unit 103 takes an image at the time when the target frame I _j is imaged by solving the model of the photometric error _Epj _{between the reference frame I i} and the target frame I _j shown in the above equations (1) to (3). The position _{Tj of the} device 17 is specified. _{The position Ti} and the position T _j shown in the equation (3) and FIG. 3 include the position and orientation of the image pickup apparatus 17. In this way, the tracking unit 103 tracks changes in the position and posture of the imaging device 17 by repeatedly executing such tracking processing on a plurality of captured images captured in time series by the imaging device 17. do.

　なお、トラッキングの手法は、上述の例に限定されるものではない。例えば、トラッキングの手法には、撮像画像上の特徴点を取得してからその特徴点のマッチング問題を解くことで各フレームの撮像時における撮像装置１７の位置および姿勢を取得するＩｎｄｉｒｅｃｔ手法（間接法）と、特徴点抽出の処理なしに撮像画像間の変換を直接推定することで各フレームの撮像時における撮像装置１７の位置および姿勢を推定するＤｉｒｅｃｔ手法（直接法）がある。上述の例は、特徴点を投影することで撮像装置１７の位置および姿勢の移動を算出したが、トラッキング部１０３は、直接法によるトラッキングを実行してもよい。また、トラッキング部１０３は、撮像画像だけではなく、ＩＭＵセンサ１８の検出結果も加味して、撮像装置１７の位置および向きを特定してもよい。 The tracking method is not limited to the above example. For example, the tracking method is an indirect method (indirect method) in which the position and orientation of the image pickup device 17 at the time of imaging of each frame are acquired by acquiring the feature points on the captured image and then solving the matching problem of the feature points. ) And the Direct method (direct method) in which the position and orientation of the image pickup apparatus 17 at the time of imaging of each frame are estimated by directly estimating the conversion between the captured images without the feature point extraction process. In the above example, the movement of the position and the posture of the image pickup apparatus 17 is calculated by projecting the feature points, but the tracking unit 103 may execute the tracking by the direct method. Further, the tracking unit 103 may specify the position and orientation of the image pickup device 17 in consideration of not only the captured image but also the detection result of the IMU sensor 18.

　トラッキング部１０３は、特定した撮像装置１７の現在の位置および姿勢を、バンドル調整部１０４と、変換部１０２とに送出する。 The tracking unit 103 sends the current position and orientation of the specified imaging device 17 to the bundle adjustment unit 104 and the conversion unit 102.

　図２に戻り、バンドル調整部１０４は、バンドル調整処理によって、トラッキング部１０３によって特定された撮像装置１７の位置および姿勢と、周囲の物体の位置情報とを補正する。バンドル調整部１０４は、処理結果として、情報処理装置１の自己位置と、地図情報とを出力する。 Returning to FIG. 2, the bundle adjustment unit 104 corrects the position and orientation of the image pickup device 17 specified by the tracking unit 103 and the position information of surrounding objects by the bundle adjustment process. The bundle adjustment unit 104 outputs the self-position of the information processing device 1 and the map information as the processing result.

　より詳細には、バンドル調整部１０４は、撮像装置１７によって撮像された撮像画像のフレームごとの再投影誤差を最小化するバンドル調整部１０４に送出する。 More specifically, the bundle adjustment unit 104 sends the captured image captured by the image pickup device 17 to the bundle adjustment unit 104 that minimizes the reprojection error for each frame.

　より詳細には、バンドル調整部１０４は、周囲の環境の各点の世界座標点（３次元位置座標）、撮像装置１７の位置および姿勢、および撮像装置１７の内部パラメータを最適化することで各フレームの再投影誤差を最小化する。 More specifically, the bundle adjustment unit 104 optimizes the world coordinate points (three-dimensional position coordinates) of each point in the surrounding environment, the position and orientation of the image pickup device 17, and the internal parameters of the image pickup device 17, respectively. Minimize frame reprojection error.

　なお、撮像装置１７の内部パラメータは、あらかじめカメラキャリブレーションされていればバンドル調整部１０４が更新しなくてもよい。撮像装置１７の内部パラメータは、例えば、焦点距離および主点である。なお、バンドル調整においては、撮像装置１７の位置および姿勢を、外部パラメータともいう。 Note that the internal parameters of the imaging device 17 do not have to be updated by the bundle adjustment unit 104 if the camera has been calibrated in advance. The internal parameters of the image pickup device 17 are, for example, the focal length and the principal point. In bundle adjustment, the position and orientation of the imaging device 17 are also referred to as external parameters.

　本実施形態のバンドル調整部１０４は、変換部１０２によってＢＩＭ情報から変換された周囲の構造物の位置を示す３次元座標を、上述の、周囲の環境の各点の世界座標点の初期値として採用する。 The bundle adjustment unit 104 of the present embodiment uses the three-dimensional coordinates indicating the positions of the surrounding structures converted from the BIM information by the conversion unit 102 as the initial values of the world coordinate points of each point in the surrounding environment as described above. adopt.

　また、バンドル調整部１０４は、トラッキング部１０３によって特定された撮像装置１７の位置および姿勢の誤差を、このバンドル調整によって調整する。バンドル調整部１０４は、バンドル調整によって再投影誤差が最小化される各点の世界座標点、撮像装置１７の位置および姿勢、および撮像装置１７の内部パラメータを求めるため、結果として、誤差が低減された撮像装置１７の位置および姿勢が算出される。また、バンドル調整後の世界座標点の集合が、地図情報となる。 Further, the bundle adjustment unit 104 adjusts the error of the position and orientation of the image pickup device 17 specified by the tracking unit 103 by this bundle adjustment. The bundle adjustment unit 104 obtains the world coordinate points of each point at which the reprojection error is minimized by the bundle adjustment, the position and orientation of the image pickup device 17, and the internal parameters of the image pickup device 17, so that the error is reduced as a result. The position and orientation of the image pickup device 17 are calculated. In addition, the set of world coordinate points after bundle adjustment becomes map information.

　図４は、第１の実施形態に係る情報処理装置１と周囲の物体との位置関係の一例を示すイメージ図である。図４に示す例では、情報処理装置１は、柱９０ａ～９０ｃが設置された建物９の中を移動するものとする。柱９０ａ～９０ｃは、物体の一例である。図４における距離ｄは、撮像装置１７から、柱９０ｃの情報処理装置１の側を向いた平面９０１上の点５２までの距離である。 FIG. 4 is an image diagram showing an example of the positional relationship between the information processing device 1 and surrounding objects according to the first embodiment. In the example shown in FIG. 4, the information processing device 1 is assumed to move in the building 9 in which the pillars 90a to 90c are installed. Pillars 90a to 90c are examples of objects. The distance d in FIG. 4 is the distance from the image pickup device 17 to the point 52 on the plane 901 facing the information processing device 1 of the pillar 90c.

　例えば、変換部１０２によって、平面９０１上の点５２の３次元座標の初期値が特定されているものとする。この場合、バンドル調整部１０４は、当該初期値から調整処理を開始し、トラッキング部１０３によって特定された撮像装置１７の位置および姿勢と、撮像画像とに基づいて、自己位置および点５２の位置の誤差を調整する。例えば、変換部１０２は、自己位置および点５２の位置の誤差を調整することで、点５２の３次元座標を補正し、より精度の高い３次元座標を得る。当該調整処理によって、バンドル調整部１０４は、自己位置と、点５２の３次元座標とを推定する。 For example, it is assumed that the conversion unit 102 specifies the initial value of the three-dimensional coordinates of the point 52 on the plane 901. In this case, the bundle adjustment unit 104 starts the adjustment process from the initial value, and based on the position and orientation of the image pickup device 17 specified by the tracking unit 103 and the captured image, the self-position and the position of the point 52 Adjust the error. For example, the conversion unit 102 corrects the three-dimensional coordinates of the point 52 by adjusting the error between the self-position and the position of the point 52, and obtains the three-dimensional coordinates with higher accuracy. By the adjustment process, the bundle adjustment unit 104 estimates the self-position and the three-dimensional coordinates of the point 52.

　また、柱９０ｃと撮像装置１７との間に、什器等の他の物体が存在する場合がある。このような物体の情報はＢＩＭ情報には含まれていないが、撮像画像には描出されているため、バンドル調整部１０４は、ＢＩＭ情報に基づく初期値を、バンドル調整によって変更することで、ＢＩＭ情報に含まれない物体の位置も推定することができる。 In addition, there may be other objects such as fixtures between the pillar 90c and the image pickup device 17. Although the information of such an object is not included in the BIM information, it is drawn in the captured image. Therefore, the bundle adjustment unit 104 changes the initial value based on the BIM information by the bundle adjustment, thereby causing the BIM. The position of an object that is not included in the information can also be estimated.

　図５は、第１の実施形態に係るバンドル調整の一例を示すイメージ図である。例えば、バンドル調整部１０４は、以下の（４）式によって、図５に示す２枚の撮像画像４３,４４上の、３次元空間上の点５２が投影された投影点４０１ａ，４０１ｂと、撮像画像４３,４４上に描出された点５２に相当する特徴点４０２ａ，４０２ｂとの誤差を最小化するように、撮像装置１７の位置と、点５２の３次元座標とを推定する。以下、撮像画像４３と撮像画像４４とを区別する場合には、便宜的に、撮像画像４３を第１の画像、撮像画像４４を第２の画像という。また、本実施形態においては、撮像装置１７の内部パラメータは、予めキャリブレーション済みであるものとし、（４）式における最適化対象のパラメータには含めていない。 FIG. 5 is an image diagram showing an example of bundle adjustment according to the first embodiment. For example, the bundle adjustment unit 104 captures the projection points 401a and 401b on which the points 52 in the three-dimensional space are projected on the two captured images 43 and 44 shown in FIG. 5 by the following equation (4). The position of the image pickup apparatus 17 and the three-dimensional coordinates of the point 52 are estimated so as to minimize the error from the feature points 402a and 402b corresponding to the points 52 drawn on the images 43 and 44. Hereinafter, when the captured image 43 and the captured image 44 are distinguished, the captured image 43 is referred to as a first image and the captured image 44 is referred to as a second image for convenience. Further, in the present embodiment, the internal parameters of the image pickup apparatus 17 are assumed to have been calibrated in advance, and are not included in the parameters to be optimized in the equation (4).

　また、上述の変換部１０２によって生成された初期値は、点５２として示す世界座標点（３次元空間上の点Ｘ_ｉ）の初期値として、（４）式で使用される。また、図５において撮像装置１７の位置を表す基準点１７０ａ，１７０ｂと、点５２とを結ぶ線を、光線束（Ｂｕｎｄｌｅ）６ａ，６ｂという。また、変換部１０２によって初期値の範囲が設定された場合には、（４）式において、３次元空間上の点Ｘ_ｉに、当該範囲に含まれる世界座標が設定されて演算が開始される。なお、初期値の範囲内の演算で誤差が最小化しない場合は、該初期値の範囲を超えて、最適な３次元空間上の点Ｘ_ｉの値が求められてもよい。 The initial value generated by the conversion unit 102 described above, as an initial value in the world coordinate point (X _i point in the three-dimensional _space) shown as point 52, it is used in equation (4). Further, in FIG. 5, the lines connecting the reference points 170a and 170b representing the position of the image pickup apparatus 17 and the points 52 are referred to as ray bundles (Bundle) 6a and 6b. Also, when the range of the initial value is set by the converter 102, the equation (4), the point X _i on the three-dimensional space, calculation is started world coordinate included in the range is set .. In the case where the error in the calculation of the range of the initial values is not minimized, beyond the scope of the initial value, the value of the point X _i on the optimum three-dimensional space may be determined.

　また、バンドル調整部１０４は、ＢＩＭ情報に基づいて、周囲の物体の平面または曲面の位置を推定し、周囲に存在する複数の点が平面上または曲面上に位置するという制約条件に基づいて、撮像装置１７から周囲の物体までの距離を算出する。例えば、図４に示す点５０ｂ～５０ｄは、全て、平面９０１上に存在する。このような場合、バンドル調整部１０４は、平面９０１を撮像範囲とする撮像画像に基づくバンドル調整処理を実行する際に、平面方程式による制約条件を課す。なお、以下、３次元空間上の個々の点５０ａ～５０ｄを特に限定しない場合には、単に点５０という。 Further, the bundle adjusting unit 104 estimates the position of the plane or the curved surface of the surrounding object based on the BIM information, and based on the constraint condition that a plurality of points existing in the surroundings are located on the plane or the curved surface. The distance from the image pickup device 17 to the surrounding objects is calculated. For example, the points 50b to 50d shown in FIG. 4 all exist on the plane 901. In such a case, the bundle adjustment unit 104 imposes a constraint condition by the equation of a plane when executing the bundle adjustment process based on the captured image with the plane 901 as the imaging range. Hereinafter, when the individual points 50a to 50d in the three-dimensional space are not particularly limited, they are simply referred to as points 50.

　例えば、バンドル調整部１０４は、（５）式および（６）式に示すように、非線形関数ｆ（ｘ）およびｇ（ｘ）による非線形最小二乗法による最適化問題を解くことにより、平面の存在を制約条件として点５０の位置、および撮像装置１７の位置を推定する。関数ｆ（ｘ）は、上述の（４）式に相当する。（５）式および（６）式の解法としては、ペナルティ法、または拡張ラグランジュ法が適用可能であるが、他の解法を採用してもよい。 For example, as shown in Eqs. (5) and (6), the bundle adjustment unit 104 solves the optimization problem by the nonlinear least squares method by the nonlinear functions f (x) and g (x), thereby presenting a plane. The position of the point 50 and the position of the imaging device 17 are estimated with the above as a constraint condition. The function f (x) corresponds to the above equation (4). As the solution of equations (5) and (6), the penalty method or the extended Lagrange method can be applied, but other solutions may be adopted.

　なお、情報処理装置１の周囲の構造物が平面を有する場合だけでなく、曲面を有する場合もある。例えば、柱９０ｂの外面は曲面である。このような場合、バンドル調整部１０４は、ＢＩＭ情報に基づいて、３次元空間上の点が曲面上にあるように曲面方程式による制約条件を課してもよい。 Note that the structure around the information processing device 1 may have a curved surface as well as a flat surface. For example, the outer surface of the pillar 90b is a curved surface. In such a case, the bundle adjustment unit 104 may impose a constraint condition by a curved surface equation so that a point on a three-dimensional space is on a curved surface based on BIM information.

　バンドル調整部１０４は、バンドル調整後の複数の点５０の３次元座標に基づいて、３次元座標をもつ点群を、地図情報として生成する。また、バンドル調整部１０４は、地図情報に新たに点５０を追加または削除することにより、地図情報を更新する。また、バンドル調整部１０４は、ＩＭＵセンサ１８の検出結果もさらに加味して、自己位置の推定結果および地図情報を調整してもよい。 The bundle adjustment unit 104 generates a point cloud having the three-dimensional coordinates as map information based on the three-dimensional coordinates of the plurality of points 50 after the bundle adjustment. Further, the bundle adjustment unit 104 updates the map information by adding or deleting a new point 50 to the map information. In addition, the bundle adjustment unit 104 may adjust the self-position estimation result and the map information in consideration of the detection result of the IMU sensor 18.

　バンドル調整部１０４は、バンドル調整処理において、周囲の物体の位置を、３次元空間上の複数の点５０の空間座標として算出し、算出した複数の点５０の空間座標を、地図情報として出力する。本実施形態において、「出力」という場合は、補助記憶装置１４への保存、または外部装置２への送信を含む。 In the bundle adjustment process, the bundle adjustment unit 104 calculates the positions of surrounding objects as the spatial coordinates of the plurality of points 50 in the three-dimensional space, and outputs the calculated spatial coordinates of the plurality of points 50 as map information. .. In the present embodiment, the term "output" includes storage in the auxiliary storage device 14 or transmission to the external device 2.

　例えば、バンドル調整部１０４は、推定した自己位置と、生成した地図情報とを、補助記憶装置１４に保存する。また、バンドル調整部１０４は、推定した自己位置と、生成した地図情報とを、外部装置２に送信してもよい。 For example, the bundle adjustment unit 104 stores the estimated self-position and the generated map information in the auxiliary storage device 14. Further, the bundle adjustment unit 104 may transmit the estimated self-position and the generated map information to the external device 2.

　移動制御部１０５は、移動装置１６を制御することにより、情報処理装置１を移動させる。例えば、移動制御部１０５は、補助記憶装置１４に保存された地図情報と、現在の自己位置とに基づいて、移動可能な経路を探索する。移動制御部１０５は、探索結果に基づいて、移動装置１６を制御する。 The movement control unit 105 moves the information processing device 1 by controlling the movement device 16. For example, the movement control unit 105 searches for a movable route based on the map information stored in the auxiliary storage device 14 and the current self-position. The movement control unit 105 controls the movement device 16 based on the search result.

　また、情報処理装置１が、超音波センサやレーザスキャナ等の測距センサを備える場合は、移動制御部１０５は、これらのセンサによる障害物等の検出結果に基づいて、障害物を回避する移動経路を生成してもよい。なお、情報処理装置１の移動制御の手法はこれらに限定させるものではなく、各種の自律移動の手法を適用することができる。 When the information processing device 1 is provided with a distance measuring sensor such as an ultrasonic sensor or a laser scanner, the movement control unit 105 moves to avoid obstacles based on the detection results of obstacles or the like by these sensors. A route may be generated. The movement control method of the information processing device 1 is not limited to these, and various autonomous movement methods can be applied.

　次に、以上のように構成された本実施形態の情報処理装置１で実行される自己位置推定および地図情報の生成処理の流れについて説明する。 Next, the flow of self-position estimation and map information generation processing executed by the information processing device 1 of the present embodiment configured as described above will be described.

　図６は、第１の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。 FIG. 6 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the first embodiment.

　まず、取得部１０１は、外部装置２からＢＩＭ情報を取得する（Ｓ１）。取得部１０１は、取得したＢＩＭ情報を補助記憶装置１４に保存する。 First, the acquisition unit 101 acquires BIM information from the external device 2 (S1). The acquisition unit 101 stores the acquired BIM information in the auxiliary storage device 14.

　そして、移動制御部１０５は、移動装置１６を制御することにより、情報処理装置１の移動を開始する（Ｓ２）。 Then, the movement control unit 105 starts the movement of the information processing device 1 by controlling the movement device 16 (S2).

　次に、取得部１０１は、撮像装置１７から撮像画像を取得する。また、取得部１０１は、ＩＭＵセンサ１８から、角速度および加速度等センシング結果を取得する（Ｓ３）。 Next, the acquisition unit 101 acquires an captured image from the imaging device 17. In addition, the acquisition unit 101 acquires sensing results such as angular velocity and acceleration from the IMU sensor 18 (S3).

　次に、トラッキング部１０３は、撮像画像に基づいて、撮像装置１７の現在の位置および姿勢を特定する（Ｓ４）。 Next, the tracking unit 103 identifies the current position and orientation of the image pickup device 17 based on the captured image (S4).

　そして、変換部１０２は、トラッキング部１０３によって特定された撮像装置１７の現在の位置および姿勢に基づいて、ＢＩＭ情報から、構造物における点の３次元座標の初期値を生成する（Ｓ５）。 Then, the conversion unit 102 generates an initial value of the three-dimensional coordinates of the point in the structure from the BIM information based on the current position and orientation of the image pickup device 17 specified by the tracking unit 103 (S5).

　そして、バンドル調整部１０４は、バンドル調整処理を実行する（Ｓ６）。具体的には、ＢＩＭ情報から生成された撮像装置１７の周囲の構造物構造物における点の３次元座標の初期値と、トラッキング部１０３によって特定された撮像装置１７の位置および姿勢と、撮像画像とに基づいて、撮像装置１７から周囲の物体までの距離を算出するとともに、撮像装置１７の位置および姿勢と、周囲の物体の３次元座標とを推定する。また、バンドル調整部１０４は、推定した周囲の物体の３次元座標に基づいて、地図情報を生成する。 Then, the bundle adjustment unit 104 executes the bundle adjustment process (S6). Specifically, the initial values of the three-dimensional coordinates of the points in the structure around the image pickup device 17 generated from the BIM information, the position and orientation of the image pickup device 17 specified by the tracking unit 103, and the captured image. Based on the above, the distance from the image pickup device 17 to the surrounding object is calculated, and the position and orientation of the image pickup device 17 and the three-dimensional coordinates of the surrounding object are estimated. In addition, the bundle adjustment unit 104 generates map information based on the estimated three-dimensional coordinates of surrounding objects.

　バンドル調整部１０４は、推定した自己位置と、生成した地図情報とを、例えば、補助記憶装置１４に保存する。 The bundle adjustment unit 104 stores the estimated self-position and the generated map information in, for example, the auxiliary storage device 14.

　また、移動制御部１０５は、補助記憶装置１４に保存された地図情報と現在の自己位置とに基づいて移動経路を探索し、探索結果に基づいて移動装置１６を制御することにより、情報処理装置１を移動させる。 Further, the movement control unit 105 searches for a movement route based on the map information stored in the auxiliary storage device 14 and the current self-position, and controls the movement device 16 based on the search result to obtain an information processing device. Move 1

　そして、移動制御部１０５は、情報処理装置１の移動を終了するか否かを判定する（Ｓ７）。移動制御部１０５は、例えば、予め定められた終了地点に情報処理装置１が到着した場合に、移動制御部１０５は、情報処理装置１の移動を終了すると判定する。なお、移動の終了の判定条件は特に限定されるものではなく、例えば、移動制御部１０５は、通信ネットワーク３を介して、外部から移動の終了の指示が入力された場合に、情報処理装置１の移動を終了すると判定してもよい。 Then, the movement control unit 105 determines whether or not to end the movement of the information processing device 1 (S7). The movement control unit 105 determines, for example, that when the information processing device 1 arrives at a predetermined end point, the movement control unit 105 ends the movement of the information processing device 1. The conditions for determining the end of movement are not particularly limited. For example, when the movement control unit 105 receives an instruction to end movement from the outside via the communication network 3, the information processing device 1 It may be determined that the movement of is completed.

　移動制御部１０５が移動を終了すると判定しない場合（Ｓ７“Ｎｏ”）、Ｓ３の処理に戻り、Ｓ３～Ｓ７の処理を繰り返す。また、移動制御部１０５が移動を終了すると判定した場合（Ｓ７“Ｙｅｓ”）、このフローチャートの処理は終了する。 If the movement control unit 105 does not determine that the movement is completed (S7 "No"), the process returns to S3 and the processes S3 to S7 are repeated. Further, when the movement control unit 105 determines that the movement is completed (S7 “Yes”), the processing of this flowchart ends.

　このように、本実施形態の情報処理装置１は、ＢＩＭ情報と、情報処理装置１の周囲を撮像した撮像画像とに基づいて、自己位置の推定と地図情報の生成とを実行する。このため、本実施形態の情報処理装置１によれば、ＢＩＭ情報を自己位置の推定と地図情報の生成の処理に利用することにより、自己位置の推定および地図情報の精度を向上させることができる。 As described above, the information processing device 1 of the present embodiment executes self-position estimation and map information generation based on the BIM information and the captured image captured around the information processing device 1. Therefore, according to the information processing apparatus 1 of the present embodiment, by using the BIM information for the processing of self-position estimation and map information generation, it is possible to improve the self-position estimation and the accuracy of the map information. ..

　例えば、本実施形態の情報処理装置１は、ＢＩＭ情報を、ＳＬＡＭ処理部１２０による自己位置の推定処理、または地図情報の生成処理の少なくとも一方の入力値に変換し、該入力値に基づいて、自己位置の推定と地図情報の生成とを実行することにより、撮像画像等の周辺検知結果のみに基づいて自己位置の推定および地図情報の生成をするよりも、自己位置の推定および地図情報の精度を向上させることができる。 For example, the information processing apparatus 1 of the present embodiment converts BIM information into at least one input value of self-position estimation processing by SLAM processing unit 120 or map information generation processing, and based on the input value, By executing self-position estimation and map information generation, self-position estimation and map information accuracy are higher than self-position estimation and map information generation based only on peripheral detection results such as captured images. Can be improved.

　また、本実施形態の情報処理装置１は、異なる時刻に撮像された複数の撮像画像を追跡することによって撮像装置１７の位置および姿勢を特定し、ＢＩＭ情報に基づく撮像装置１７から周囲の物体における点の３次元座標の初期値または初期値の範囲を、撮像画像の追跡によって特定した撮像装置１７の位置および姿勢と、撮像画像とに基づいて変更することにより、撮像装置１７から周囲の物体までの距離と、自己位置と、周囲の物体の位置とを算出する。より具体的には、本実施形態の情報処理装置１は、ＢＩＭ情報に基づく撮像装置１７から周囲の物体までの距離を、バンドル調整処理における初期値または初期値の範囲として使用する。このため、本実施形態の本実施形態の情報処理装置１によれば、ＢＩＭ情報に基づく撮像装置１７から周囲の物体までの距離を、撮像装置１７から周囲の物体における点の３次元座標の距離の初期値または初期値の範囲として使用することにより、バンドル調整処理における演算量を低減することができる。 Further, the information processing device 1 of the present embodiment identifies the position and orientation of the image pickup device 17 by tracking a plurality of captured images captured at different times, and from the image pickup device 17 based on the BIM information to a surrounding object. By changing the initial value or the range of the initial value of the three-dimensional coordinates of the point based on the position and orientation of the image pickup device 17 identified by tracking the captured image and the captured image, from the image pickup device 17 to the surrounding object. Calculate the distance, self-position, and position of surrounding objects. More specifically, the information processing device 1 of the present embodiment uses the distance from the image pickup device 17 based on the BIM information to the surrounding object as the initial value or the range of the initial value in the bundle adjustment process. Therefore, according to the information processing device 1 of the present embodiment of the present embodiment, the distance from the image pickup device 17 based on the BIM information to the surrounding object is the distance from the image pickup device 17 to the surrounding object in three-dimensional coordinates. By using it as the initial value or the range of the initial value of, the amount of calculation in the bundle adjustment process can be reduced.

　例えば、比較例として、一般的なバンドル調整処理においては、３次元空間上の点の３次元座標の初期値を無限と仮定する場合がある。このような場合、例えば、３次元空間上の点と撮像装置との間の距離が１ｍなのか、１０００ｍなのかが特定されない状態でバンドル調整処理が開始するため、算出結果が収束までの演算量が増大する場合がある。これに対して、本実施形態の情報処理装置１は、ＢＩＭ情報に基づく初期値を用いるため、少ない演算量で処理結果を収束させることができる。 For example, as a comparative example, in a general bundle adjustment process, the initial value of the three-dimensional coordinates of a point in the three-dimensional space may be assumed to be infinite. In such a case, for example, the bundle adjustment process starts without specifying whether the distance between the point in the three-dimensional space and the image pickup device is 1 m or 1000 m, so that the amount of calculation until the calculation result converges. May increase. On the other hand, since the information processing apparatus 1 of the present embodiment uses the initial value based on the BIM information, the processing result can be converged with a small amount of calculation.

　また、本実施形態の情報処理装置１は、ＢＩＭ情報に基づいて、周囲の物体の平面または曲面の位置を推定し、周囲に存在する複数の点が平面上または曲面上に位置するという制約条件に基づいて、撮像装置１７から周囲の物体までの距離を算出する。このため、本実施形態の情報処理装置１によれば、同一の平面又は曲面上に存在する複数の点の位置を別個に求める場合よりも、演算量を低減することができる。 Further, the information processing apparatus 1 of the present embodiment estimates the position of the plane or the curved surface of the surrounding object based on the BIM information, and the constraint condition that a plurality of points existing in the surroundings are located on the plane or the curved surface. The distance from the image pickup apparatus 17 to the surrounding objects is calculated based on the above. Therefore, according to the information processing apparatus 1 of the present embodiment, the amount of calculation can be reduced as compared with the case where the positions of a plurality of points existing on the same plane or curved surface are separately obtained.

　また、本実施形態の情報処理装置１は、周囲の物体の位置を、３次元空間上の複数の点５０の空間座標として算出し、算出した複数の点５０の空間座標を、地図情報として出力する。本実施形態の情報処理装置１によれば、ＢＩＭ情報を用いたバンドル調整処理によって算出した周囲の物体の位置を地図情報として出力することにより、より高精度な地図情報を提供することができる。 Further, the information processing device 1 of the present embodiment calculates the positions of surrounding objects as the spatial coordinates of the plurality of points 50 in the three-dimensional space, and outputs the calculated spatial coordinates of the plurality of points 50 as map information. do. According to the information processing device 1 of the present embodiment, more accurate map information can be provided by outputting the positions of surrounding objects calculated by the bundle adjustment process using BIM information as map information.

　なお、情報処理装置１は、監視、警備、清掃、または荷物の配送等の機能を備えるロボット等であってもよい。この場合、情報処理装置１は、推定した自己位置および地図情報に基づいて建物９を移動することにより、種々の機能を実現する。また、情報処理装置１によって生成された地図情報は、情報処理装置１自体の移動経路の生成に利用されるだけではなく、遠隔地から建物９を監視または管理する際に使用されてもよい。また、情報処理装置１によって生成された地図情報は、情報処理装置１以外のロボットまたはドローンの移動経路の生成に利用されてもよい。 The information processing device 1 may be a robot or the like having functions such as monitoring, security, cleaning, and delivery of luggage. In this case, the information processing device 1 realizes various functions by moving the building 9 based on the estimated self-position and map information. Further, the map information generated by the information processing device 1 may be used not only for generating the movement route of the information processing device 1 itself, but also for monitoring or managing the building 9 from a remote location. Further, the map information generated by the information processing device 1 may be used to generate a movement route of a robot or drone other than the information processing device 1.

　なお、撮像装置１７はステレオカメラに限定されるものではない。例えば、撮像装置１７は、ＲＧＢ（Red　Green　Blue）カメラと３次元計測カメラ（Ｄｅｐｔｈカメラ）とを有するＲＧＢ－Ｄカメラ、または単眼カメラ等であってもよい。 The image pickup device 17 is not limited to the stereo camera. For example, the image pickup device 17 may be an RGB-D camera having an RGB (Red Green Blue) camera and a three-dimensional measurement camera (Dept camera), a monocular camera, or the like.

　また、情報処理装置１が備えるセンサは、ＩＭＵセンサ１８に限定されるものではなく、ジャイロセンサ、加速度センサ、磁気センサ等が個別に設けられてもよい。 Further, the sensor included in the information processing device 1 is not limited to the IMU sensor 18, and a gyro sensor, an acceleration sensor, a magnetic sensor, or the like may be individually provided.

　また、本実施形態では、ＳＬＡＭ処理部１２０は、撮像画像を用いた画像ＳＬＡＭ（Visual　SLAM）を実行するものとしたが、撮像画像を用いないＳＬＡＭが採用されてもよい。例えば、情報処理装置１は、撮像装置１７ではなく、Ｌｉｄａｒ（Light　Detection　and　Ranging、またはLaser　Imaging　Detection　and　Ranging）等によって周囲の構造物を検出してもよい。この場合、ＳＬＡＭ処理部１２０は、Ｌｉｄａｒによる測距結果に基づいて、情報処理装置１の位置および向きを特定してもよい。 Further, in the present embodiment, the SLAM processing unit 120 executes the image SLAM (Visual SLAM) using the captured image, but the SLAM that does not use the captured image may be adopted. For example, the information processing device 1 may detect surrounding structures by Lidar (Light Detection and Ringing or Laser Imaging Detection and Ringing) or the like instead of the image pickup device 17. In this case, the SLAM processing unit 120 may specify the position and orientation of the information processing device 1 based on the distance measurement result by Lidar.

　また、本実施形態では、ＳＬＡＭ処理部１２０は３次元の地図情報を生成するとしたが、２次元の地図情報を生成するものとしてもよい。 Further, in the present embodiment, the SLAM processing unit 120 is supposed to generate three-dimensional map information, but it may be possible to generate two-dimensional map information.

　また、本実施形態で例示した（１）式～（６）式は一例であり、トラッキング処理またはバンドル調整処理で用いられる数式は、これらに限定されるものではない。また、バンドル調整部１０４は、（５）式および（６）式による制約条件を課さずに、（４）式によってバンドル調整を行ってもよい。また、トラッキング処理においては、近傍パターンＮ_ｐを使用せずに、トラッキング処理を実行してもよい。 Further, the equations (1) to (6) illustrated in the present embodiment are examples, and the mathematical expressions used in the tracking process or the bundle adjustment process are not limited to these. Further, the bundle adjustment unit 104 may perform bundle adjustment according to the equation (4) without imposing the constraint conditions according to the equations (5) and (6). Further, in the tracking process, without using the proximity pattern N _p, it may perform the tracking process.

　また、ＳＬＡＭ処理部１２０は、トラッキング処理またはバンドル調整処理以外の手法によって、自己位置の推定および地図情報の生成をしてもよい。例えば、本実施形態においてはトラッキング部１０３を特定部の一例としたが、トラッキング以外の手法によって情報処理装置１の位置および姿勢の変化を特定する手法を採用してもよい。 Further, the SLAM processing unit 120 may estimate its own position and generate map information by a method other than tracking processing or bundle adjustment processing. For example, in the present embodiment, the tracking unit 103 is used as an example of the specific unit, but a method of specifying a change in the position and posture of the information processing device 1 by a method other than tracking may be adopted.

　また、ＳＬＡＭ処理には、本実施形態で例示した処理以外に、自己位置推定または地図情報の精度を向上させるための各種の処理が追加されてもよい。例えば、ＳＬＡＭ処理部１２０は、ループ閉じ込み処理等をさらに実行してよい。 Further, in addition to the processes exemplified in this embodiment, various processes for improving the accuracy of self-position estimation or map information may be added to the SLAM process. For example, the SLAM processing unit 120 may further execute a loop closing process or the like.

　なお、上述した本実施形態における情報処理装置１の一部又は全部は、ハードウェアで構成されていてもよいし、ＣＰＵ、又はＧＰＵ等が実行するソフトウェア（プログラム）の情報処理で構成されてもよい。ソフトウェアの情報処理で構成される場合には、上述した実施形態における各装置の少なくとも一部の機能を実現するソフトウェアを、フレキシブルディスク、ＣＤ－ＲＯＭ（Compact　Disc-Read　Only　Memory）、又はＵＳＢメモリ等の非一時的な記憶媒体（非一時的なコンピュータ可読媒体）に収納し、コンピュータに読み込ませることにより、ソフトウェアの情報処理を実行してもよい。また、通信ネットワークを介して当該ソフトウェアがダウンロードされてもよい。さらに、ソフトウェアがＡＳＩＣ、又はＦＰＧＡ等の回路に実装されることにより、情報処理がハードウェアにより実行されてもよい。 A part or all of the information processing device 1 in the above-described embodiment may be composed of hardware, or may be composed of information processing of software (program) executed by a CPU, GPU, or the like. good. When it is composed of software information processing, software that realizes at least a part of the functions of each device in the above-described embodiment is a flexible disk, a CD-ROM (Compact Disc-Read Only Memory), a USB memory, or the like. The software may process information by storing it in a non-temporary storage medium (non-temporary computer-readable medium) and loading it into a computer. In addition, the software may be downloaded via a communication network. Further, information processing may be executed by hardware by implementing the software in a circuit such as an ASIC or FPGA.

　ソフトウェアを収納する記憶媒体の種類は限定されるものではない。記憶媒体は、磁気ディスク、又は光ディスク等の着脱可能なものに限定されず、ハードディスク、又はメモリ等の固定型の記憶媒体であってもよい。また、記憶媒体は、コンピュータ内部に備えられてもよいし、コンピュータ外部に備えられてもよい。 The type of storage medium that stores the software is not limited. The storage medium is not limited to a removable one such as a magnetic disk or an optical disk, and may be a fixed storage medium such as a hard disk or a memory. Further, the storage medium may be provided inside the computer or may be provided outside the computer.

　また、図１においては、情報処理装置１は、各構成要素を一つ備えているが、同じ構成要素を複数備えていてもよい。また、図１では、１台の情報処理装置１が示されているが、ソフトウェアが複数台のコンピュータにインストールされて、当該複数台のコンピュータそれぞれがソフトウェアの同一の又は異なる一部の処理を実行してもよい。この場合、コンピュータそれぞれがネットワークインタフェース１３等を介して通信して処理を実行する分散コンピューティングの形態であってもよい。つまり、上述した実施形態における情報処理装置１は、１又は複数の記憶装置に記憶された命令を１台又は複数台のコンピュータが実行することで機能を実現するシステムとして構成されてもよい。また、端末から送信された情報をクラウド上に設けられた１台又は複数台のコンピュータで処理し、この処理結果を端末に送信するような構成であってもよい。 Further, in FIG. 1, the information processing device 1 includes one component, but may include a plurality of the same components. Further, although one information processing device 1 is shown in FIG. 1, software is installed on a plurality of computers, and each of the plurality of computers executes the same or different part of the processing of the software. You may. In this case, it may be a form of distributed computing in which each computer communicates via a network interface 13 or the like to execute processing. That is, the information processing device 1 in the above-described embodiment may be configured as a system that realizes a function by executing instructions stored in one or a plurality of storage devices by one or a plurality of computers. Further, the information transmitted from the terminal may be processed by one or a plurality of computers provided on the cloud, and the processing result may be transmitted to the terminal.

　上述した実施形態における情報処理装置１の各種演算は、１又は複数のプロセッサを用いて、又は、ネットワークを介した複数台のコンピュータを用いて、並列処理で実行されてもよい。また、各種演算が、プロセッサ内に複数ある演算コアに振り分けられて、並列処理で実行されてもよい。また、本開示の処理、手段等の一部又は全部は、ネットワークを介して情報処理装置１通信可能なクラウド上に設けられたプロセッサ及び記憶装置の少なくとも一方により実行されてもよい。このように、上述した実施形態における各装置は、１台又は複数台のコンピュータによる並列コンピューティングの形態であってもよい。 Various operations of the information processing device 1 in the above-described embodiment may be executed in parallel processing by using one or a plurality of processors or by using a plurality of computers via a network. Further, various operations may be distributed to a plurality of arithmetic cores in the processor and executed in parallel processing. In addition, some or all of the processes, means, etc. of the present disclosure may be executed by at least one of a processor and a storage device provided on the cloud capable of communicating with the information processing device 1 via the network. As described above, each device in the above-described embodiment may be in the form of parallel computing by one or a plurality of computers.

　上述した実施形態における情報処理装置１は、１又は複数のプロセッサ１１により実現されてもよい。ここで、プロセッサ１１は、１チップ上に配置された１又は複数の電子回路を指してもよいし、２つ以上のチップあるいは２つ以上のデバイス上に配置された１又は複数の電子回路を指してもよい。複数の電子回路を用いる場合、各電子回路は有線又は無線により通信してもよい。 The information processing device 1 in the above-described embodiment may be realized by one or a plurality of processors 11. Here, the processor 11 may refer to one or more electronic circuits arranged on one chip, or may refer to one or more electronic circuits arranged on two or more chips or two or more devices. You may point. When a plurality of electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.

　記憶装置（メモリ）１つに対して、複数のプロセッサが接続（結合）されてもよいし、単数のプロセッサが接続されてもよい。プロセッサ１つに対して、複数の記憶装置（メモリ）が接続（結合）されてもよい。上述した実施形態における情報処理装置１が、少なくとも１つの記憶装置（メモリ）とこの少なくとも１つの記憶装置（メモリ）に接続（結合）される複数のプロセッサで構成される場合、複数のプロセッサのうち少なくとも１つのプロセッサが、少なくとも１つの記憶装置（メモリ）に接続（結合）される構成を含んでもよい。また、複数台のコンピュータに含まれる記憶装置（メモリ）とプロセッサによって、この構成が実現されてもよい。さらに、記憶装置（メモリ）がプロセッサと一体になっている構成（例えば、Ｌ１キャッシュ、Ｌ２キャッシュを含むキャッシュメモリ）を含んでもよい。 Multiple processors may be connected (combined) to one storage device (memory), or a single processor may be connected. A plurality of storage devices (memory) may be connected (combined) to one processor. When the information processing device 1 in the above-described embodiment is composed of at least one storage device (memory) and a plurality of processors connected (combined) to the at least one storage device (memory), among the plurality of processors At least one processor may include a configuration in which it is connected (combined) to at least one storage device (memory). Further, this configuration may be realized by a storage device (memory) and a processor included in a plurality of computers. Further, a configuration in which the storage device (memory) is integrated with the processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.

　また、外部装置２は、サーバ装置に限定されるものではない。また、外部装置２は、クラウド環境に設けられてもよい。また、外部装置２を、請求の範囲における情報処理装置の一例としてもよい。 Further, the external device 2 is not limited to the server device. Further, the external device 2 may be provided in a cloud environment. Further, the external device 2 may be used as an example of the information processing device in the claims.

　また、他の一例として、外部装置２は、入力装置であってもよい。また、デバイスインタフェース１５は、移動装置１６、撮像装置１７、ＩＭＵセンサ１８だけではなく、入力装置と接続するものとしてもよい。入力装置は、例えば、カメラ、マイクロフォン、モーションキャプチャ、各種センサ、キーボード、マウス、又はタッチパネル等のデバイスであり、取得した情報を情報処理装置１に与える。また、パーソナルコンピュータ、タブレット端末、又はスマートフォン等の入力部とメモリとプロセッサを備えるデバイスであってもよい。 Further, as another example, the external device 2 may be an input device. Further, the device interface 15 may be connected not only to the mobile device 16, the image pickup device 17, and the IMU sensor 18, but also to the input device. The input device is, for example, a device such as a camera, a microphone, a motion capture, various sensors, a keyboard, a mouse, or a touch panel, and gives the acquired information to the information processing device 1. Further, it may be a device including an input unit, a memory and a processor such as a personal computer, a tablet terminal, or a smartphone.

　また、他の一例として、外部装置２は、出力装置でもよい。また、デバイスインタフェース１５は、出力装置と接続するものとしてもよい。出力装置は、例えば、ＬＣＤ（Liquid　Crystal　Display）、ＣＲＴ（Cathode　Ray　Tube）、ＰＤＰ（Plasma　Display　Panel）、又は有機ＥＬ（Electro　Luminescence）パネル等の表示装置であってもよいし、音声等を出力するスピーカ等であってもよい。また、パーソナルコンピュータ、タブレット端末、又はスマートフォン等の出力部とメモリとプロセッサを備えるデバイスであってもよい。 Further, as another example, the external device 2 may be an output device. Further, the device interface 15 may be connected to the output device. The output device may be, for example, a display device such as an LCD (Liquid Crystal Display), a CRT (Cathode Ray Tube), a PDP (Plasma Display Panel), or an organic EL (Electro Luminescence) panel, and outputs audio or the like. It may be a speaker or the like. Further, it may be a device including an output unit such as a personal computer, a tablet terminal, or a smartphone, a memory, and a processor.

　また、他の一例として、外部装置２は、記憶装置（メモリ）であってもよい。また、デバイスインタフェース１５は、記憶装置（メモリ）と接続するものとしてもよい。例えば、外部装置２はネットワークストレージ等であってもよく、デバイスインタフェース１５にはＨＤＤ等のストレージが接続するものとしてもよい。 Further, as another example, the external device 2 may be a storage device (memory). Further, the device interface 15 may be connected to a storage device (memory). For example, the external device 2 may be a network storage or the like, and a storage such as an HDD may be connected to the device interface 15.

　また、外部装置２、または、デバイスインタフェース１５に接続する外部装置は、上述した実施形態における情報処理装置１の構成要素の一部の機能を有する装置でもよい。つまり、情報処理装置１は、外部装置２または、デバイスインタフェース１５に接続する外部装置の処理結果の一部又は全部を送信又は受信してもよい。 Further, the external device 2 or the external device connected to the device interface 15 may be a device having some functions of the components of the information processing device 1 in the above-described embodiment. That is, the information processing device 1 may transmit or receive a part or all of the processing results of the external device 2 or the external device connected to the device interface 15.

　また、自己位置推定処理および地図情報の生成処理を実行している間、情報処理装置１は、外部装置２と通信ネットワーク３を介して常時接続していても良いが、これに限定されるものではない。例えば、情報処理装置１は、自己位置推定処理および地図情報の生成処理を実行している間、外部装置２との接続をオフラインにしていても良い。 Further, while the self-position estimation process and the map information generation process are being executed, the information processing device 1 may be constantly connected to the external device 2 via the communication network 3, but is limited to this. is not it. For example, the information processing device 1 may take the connection with the external device 2 offline while executing the self-position estimation process and the map information generation process.

　なお、本実施形態では、変換部１０２がＢＩＭ情報から特定した建物９に含まれる点の世界座標を、周囲の物体の位置に関する情報の一例としたが、周囲の物体の位置に関する情報はこれに限定されるものではない。 In the present embodiment, the world coordinates of the points included in the building 9 specified by the conversion unit 102 from the BIM information are used as an example of the information regarding the positions of the surrounding objects, but the information regarding the positions of the surrounding objects is included in this. It is not limited.

　例えば、変換部１０２は、ＢＩＭ情報と、撮像装置１７の位置および姿勢とに基づいて、情報処理装置１と周囲の物体との距離を示す情報を生成しても良い。撮像装置１７の位置および姿勢は、例えば、トラッキング部１０３によって特定された位置および姿勢を採用することができる。 For example, the conversion unit 102 may generate information indicating the distance between the information processing device 1 and a surrounding object based on the BIM information and the position and orientation of the image pickup device 17. As the position and orientation of the image pickup apparatus 17, for example, the position and orientation specified by the tracking unit 103 can be adopted.

　この場合、（４）式における３次元空間上の点Ｘ_ｉと、撮像装置１７の位置（Ｒ_ｊ，ｔ_ｊ）、（Ｒ_ｊ＋１，ｔ_ｊ＋１）との間の距離が、変換部１０２によって生成された距離を示す情報によって特定される。この場合、バンドル調整部１０４は、演算の開始時には、変換部１０２によって生成された距離を示す情報によって特定された３次元空間上の点Ｘ_ｉ撮像装置１７の位置（Ｒ_ｊ，ｔ_ｊ）、（Ｒ_ｊ＋１，ｔ_ｊ＋１）との間の距離と一致するように、各パラメータに値を設定する。なお、この手法においても、バンドル調整による誤差の調整の結果求められる３次元空間上の点Ｘ_ｉおよび撮像装置１７の位置は、ＢＩＭ情報から求められた結果とは異なっても良い。 In this case, generated by (4) and _{X i} point in the three-dimensional space in the expression, position of the imaging device 17 _(R _j, t _j), the distance between the _{_{(R j + 1, t j}} + 1) is, the conversion unit 102 It is identified by information that indicates the distance taken. In this case, at the start of the calculation, the bundle adjustment unit 104 determines _{the position (R j} , t _j ) _{of the point X i} imaging device 17 in the three-dimensional space specified by the information indicating the distance generated by the conversion unit 102. Set a value for each parameter so that it matches the distance between (R _{j + 1} , t _{j + 1).} Also in this method, the position of a point X _i and the imaging device 17 in a three-dimensional space obtained result of the adjustment of the error by bundle adjustment may be different from the result obtained from the BIM data.

　また、例えば、変換部１０２は、ＢＩＭ情報から建物９の寸法を特定し、該寸法を、周囲の物体の位置に関する情報の一例としても良い。この場合、バンドル調整部１０４は、（４）式における３次元空間上の点Ｘ_ｉと、撮像装置１７の位置（Ｒ_ｊ，ｔ_ｊ）、（Ｒ_ｊ＋１，ｔ_ｊ＋１）とが、ＢＩＭ情報から特定された建物９の寸法の範囲内に含まれるように、バンドル調整を実行する。例えば、ＢＩＭ情報から特定された建物９の全長が５０ｍ、幅が５０ｍである場合、建物９の構造物に含まれる３次元空間上の点Ｘ_ｉは、撮像装置１７の位置から、５０ｍ以上離れることは無いため、３次元空間上の点Ｘ_ｉおよび撮像装置１７の位置（Ｒ_ｊ，ｔ_ｊ）、（Ｒ_ｊ＋１，ｔ_ｊ＋１）が取り得る値の範囲が限定される。このように、バンドル調整部１０４は、変換部１０２がＢＩＭ情報から変換した周囲の物体の位置に関する情報を用いてバンドル調整することで、演算量を低減することができる。 Further, for example, the conversion unit 102 may specify the dimension of the building 9 from the BIM information, and the dimension may be used as an example of information regarding the position of a surrounding object. In this case, bundle adjustment unit 104 (4) and _{X i} point in the three-dimensional space in the expression, position of the imaging device 17 _(R _j, t _j), although the _{_{(R j + 1, t j}} + 1), the BIM information Bundle adjustment is performed to be within the dimensions of the identified building 9. For example, if the total length of the building 9 identified from BIM information 50m, a width of 50m, _{X i} point in the three-dimensional space are included in the building structure 9 moves away from the position of the imaging device 17, 50 m or more Therefore, the range of values that can be taken by _{the points X i} in the three-dimensional space and the positions (R _j , t _j ) and (R _{j + 1} , t _{j + 1) of the imaging device 17 is limited.} In this way, the bundle adjustment unit 104 can reduce the amount of calculation by performing bundle adjustment using the information regarding the position of the surrounding object converted from the BIM information by the conversion unit 102.

（第２の実施形態）
　上述の第１の実施形態においては、環境情報は、ＢＩＭ情報または３Ｄ　ＣＡＤデータ等の３次元設計情報であった。この第２の実施形態では、環境情報は、少なくとも建物９における人物の入退出情報、または撮像画像における人物の画像認識結果のいずれか１つを含むものとする。環境情報は、建物９における人物の入退出情報と、撮像画像における人物の画像認識結果の両方を含むものとしても良いし、いずれか一方を含むものとしても良い。 (Second Embodiment)
In the first embodiment described above, the environmental information is three-dimensional design information such as BIM information or 3D CAD data. In this second embodiment, the environmental information includes at least one of the entry / exit information of the person in the building 9 and the image recognition result of the person in the captured image. The environmental information may include both the entry / exit information of the person in the building 9 and the image recognition result of the person in the captured image, or may include either one.

　図７は、第２の実施形態に係る情報処理装置１が備える機能の一例を示すブロック図である。図７に示すように、本実施形態の情報処理装置１は、取得部１１０１と、変換部１１０２と、ＳＬＡＭ処理部１１２０と、移動制御部１０５とを備える。また、ＳＬＡＭ処理部１１２０は、トラッキング部１１０３と、バンドル調整部１１０４とを含む。また、変換部１１０２は、初期値生成部１０６と、マスク情報生成部１０７とを含む。 FIG. 7 is a block diagram showing an example of the functions included in the information processing device 1 according to the second embodiment. As shown in FIG. 7, the information processing apparatus 1 of the present embodiment includes an acquisition unit 1101, a conversion unit 1102, a SLAM processing unit 1120, and a movement control unit 105. Further, the SLAM processing unit 1120 includes a tracking unit 1103 and a bundle adjusting unit 1104. Further, the conversion unit 1102 includes an initial value generation unit 106 and a mask information generation unit 107.

　移動制御部１０５は、第１の実施形態と同様の機能を備える。 The movement control unit 105 has the same function as that of the first embodiment.

　本実施形態の取得部１１０１は、第１の実施形態と同様の機能を備えた上で、建物９における人物の入退出情報を取得する。 The acquisition unit 1101 of the present embodiment has the same function as that of the first embodiment, and acquires the entry / exit information of the person in the building 9.

　入退出情報は、建物９の部屋ごとまたはフロアごとに入退出した人物の人数と、入退出の時刻とを表す情報である。例えば、建物９の部屋またはフロアの出入り口に、入退出を検出するセンサが設置され、センサによる検出結果が外部装置２に送信されているものとする。この場合、取得部１１０１は、外部装置２から入退出情報を取得する。なお、人物の入退出の検出方法はセンサに限定されるものではない。例えば、入退出情報は、カードリーダによるセキュリティカードの読み取り記録や、建物９に設置された監視カメラの撮像画像からの人物の検出結果であってもよい。 The entry / exit information is information indicating the number of people entering / exiting each room or floor of the building 9 and the time of entering / exiting. For example, it is assumed that a sensor for detecting entry / exit is installed at the entrance / exit of a room or floor of the building 9, and the detection result by the sensor is transmitted to the external device 2. In this case, the acquisition unit 1101 acquires the entry / exit information from the external device 2. The method of detecting the entry / exit of a person is not limited to the sensor. For example, the entry / exit information may be a reading record of a security card by a card reader or a detection result of a person from an image captured by a surveillance camera installed in a building 9.

　取得部１１０１は、取得した入退出情報を、補助記憶装置１４に保存する。 The acquisition unit 1101 stores the acquired entry / exit information in the auxiliary storage device 14.

　本実施形態の変換部１１０２は、第１の実施形態と同様の機能を備えた上で、環境情報に基づいて、情報処理装置１が位置する建物９において地図情報の生成の対象から除外する領域を表すマスク情報を生成する。 The conversion unit 1102 of the present embodiment has the same functions as those of the first embodiment, and is excluded from the target of map information generation in the building 9 where the information processing device 1 is located based on the environmental information. Generates mask information that represents.

　本実施形態において、環境情報は、少なくとも入退出情報、または人物の画像認識結果の一方を含むものとする。人物の画像認識結果は、撮像装置１７によって撮像された撮像画像から、画像処理によって人物が認識された結果である。 In the present embodiment, the environmental information includes at least one of the entry / exit information and the image recognition result of the person. The image recognition result of a person is a result of recognizing a person by image processing from the captured image captured by the image pickup device 17.

　具体的には、本実施形態の環境情報は、入退出情報と画像認識結果との両方と、第１の実施形態と同様の３次元設計情報とを含む。 Specifically, the environmental information of the present embodiment includes both the entry / exit information and the image recognition result, and the same three-dimensional design information as that of the first embodiment.

　より詳細には、変換部１１０２は、初期値生成部１０６と、マスク情報生成部１０７とを含む。初期値生成部１０６は、第１の実施形態における変換部１０２と同様の機能を備える。 More specifically, the conversion unit 1102 includes an initial value generation unit 106 and a mask information generation unit 107. The initial value generation unit 106 has the same function as the conversion unit 102 in the first embodiment.

　また、マスク情報は、地図情報の生成の対象から除外される領域を表す情報である。 The mask information is information representing an area excluded from the target of map information generation.

　マスク情報生成部１０７は、入退出情報または人物の画像認識結果に基づいて、建物９において人物が位置する領域を判定し、判定した該領域を、地図情報の生成の対象から除外する領域とする。 The mask information generation unit 107 determines the area where the person is located in the building 9 based on the entry / exit information or the image recognition result of the person, and sets the determined area as an area to be excluded from the target of map information generation. ..

　マスク情報生成部１０７は、撮像装置１７によって撮像された撮像画像から、画像処理によって、人物を認識する。また、マスク情報生成部１０７は、画像処理において、撮像画像に描出された物体が人物か否かの判定が困難な場合、入退出情報に基づいて、当該撮像画像が撮像された部屋またはフロアに、当該撮像画像の撮像時刻に人物が存在したか否かを判定する。マスク情報生成部１０７は、当該撮像画像が撮像された部屋またはフロアに、当該撮像画像の撮像時刻に人物が存在したと判定した場合、人物が存在しないと判定した場合よりも、撮像画像に描出された物体が人物である可能性を高く推定する。 The mask information generation unit 107 recognizes a person by image processing from the captured image captured by the imaging device 17. Further, when it is difficult to determine whether or not the object depicted in the captured image is a person in the image processing, the mask information generation unit 107 may move to the room or floor where the captured image is captured based on the entry / exit information. , It is determined whether or not a person exists at the time when the captured image is captured. When the mask information generation unit 107 determines that a person exists in the room or floor where the captured image is captured at the time when the captured image is captured, the mask information generation unit 107 draws the captured image on the captured image rather than determining that the person does not exist. It is highly probable that the object is a person.

　図８は、第２の実施形態に係る情報処理装置１と周囲の物体との位置関係の一例を示すイメージ図である。図８に示す例では、建物９において、情報処理装置１が存在する部屋に、人物７０が存在する。柱９０ａ～９０ｃなどとは異なり、人物７０は移動するため、地図情報に人物７０の存在を含めると、地図情報の精度が低下する可能性がある。 FIG. 8 is an image diagram showing an example of the positional relationship between the information processing device 1 and surrounding objects according to the second embodiment. In the example shown in FIG. 8, in the building 9, the person 70 exists in the room where the information processing device 1 exists. Unlike the pillars 90a to 90c, the person 70 moves, so if the presence of the person 70 is included in the map information, the accuracy of the map information may decrease.

　マスク情報生成部１０７は、人物７０が存在する領域８０を表すマスク情報を生成する。マスク情報は、例えば、人物７０が存在する領域８０を３次元座標で表す。 The mask information generation unit 107 generates mask information representing the area 80 in which the person 70 exists. The mask information represents, for example, the area 80 in which the person 70 exists in three-dimensional coordinates.

　なお、本実施形態においては、マスク情報生成部１０７は、入退出情報と人物の画像認識結果の両方を使用してマスク情報を生成しているが、いずれか一方のみに基づいてマスク情報を生成してもよい。 In the present embodiment, the mask information generation unit 107 generates mask information using both the entry / exit information and the image recognition result of the person, but the mask information is generated based on only one of them. You may.

　また、マスク情報生成部１０７は、撮像装置１７によって撮像された撮像画像から、車両等の移動体や、台車等の一時的に建物９内に存在する機器等の物体を、画像認識によって検出してもよい。この場合、マスク情報生成部１０７は、これらの物体が存在すると判定した領域を、地図情報の生成の対象から除外する領域とする。 Further, the mask information generation unit 107 detects a moving body such as a vehicle or an object such as a device temporarily existing in the building 9 such as a dolly from the image captured by the image pickup device 17 by image recognition. You may. In this case, the mask information generation unit 107 sets the area where it is determined that these objects exist as an area to be excluded from the target of map information generation.

　マスク情報生成部１０７は、地図情報の生成の対象から除外する領域を表すマスク情報を生成し、ＳＬＡＭ処理部１１２０に送出する。 The mask information generation unit 107 generates mask information representing an area to be excluded from the target of map information generation, and sends it to the SLAM processing unit 1120.

　図７に戻り、本実施形態のＳＬＡＭ処理部１１２０は、第１の実施形態の機能を備えた上で、マスク情報に相当する領域については、地図情報を生成しない。 Returning to FIG. 7, the SLAM processing unit 1120 of the present embodiment has the functions of the first embodiment and does not generate map information for the area corresponding to the mask information.

　より詳細には、本実施形態のＳＬＡＭ処理部１１２０のトラッキング部１１０３は、第１の実施形態と同様の（１）式によってトラッキング処理を実行する際に、近傍パターンＮ_ｐがマスク情報によって表される領域が描出される画像領域である場合、近傍パターンＮ_ｐに対して、マスク値を乗算する。マスク値は、例えば“０”または“１”であるが、これに限定されるものではない。なお、マスクを適用する手法はこれに限定されるものではなく、他の手法を採用してもよい。 More specifically, the tracking section 1103 of the SLAM processor 1120 of the present embodiment, when performing the tracking process by the same equation (1) in the first embodiment, near the pattern N _p is represented by the mask information If the area that is the image area to be rendered, with respect to the vicinity pattern N _p, is multiplied by a mask value. The mask value is, for example, “0” or “1”, but is not limited thereto. The method of applying the mask is not limited to this, and other methods may be adopted.

　また、本実施形態のＳＬＡＭ処理部１１２０のバンドル調整部１１０４は、第１の実施形態の機能を備えた上で、マスク情報に相当する領域については、バンドル調整の対象外とする。 Further, the bundle adjustment unit 1104 of the SLAM processing unit 1120 of the present embodiment has the functions of the first embodiment, and the area corresponding to the mask information is excluded from the bundle adjustment.

　図９は、第２の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。 FIG. 9 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the second embodiment.

　Ｓ１のＢＩＭ情報の取得の処理は、図６で説明した第１の実施形態と同様である。次に、取得部１１０１は、入退出情報取得を取得する（Ｓ２１）。取得部１１０１は、取得した入退出情報を、補助記憶装置１４に保存する。 The process of acquiring the BIM information in S1 is the same as that of the first embodiment described with reference to FIG. Next, the acquisition unit 1101 acquires the entry / exit information acquisition (S21). The acquisition unit 1101 stores the acquired entry / exit information in the auxiliary storage device 14.

　Ｓ２の情報処理装置１の移動の開始の処理から、Ｓ３の撮像画像および角速度および加速度等センシング結果の取得の処理は、第１の実施形態と同様である。 The process of acquiring the captured image of S3 and the sensing results such as angular velocity and acceleration from the process of starting the movement of the information processing device 1 of S2 is the same as that of the first embodiment.

　次に、本実施形態の変換部１１０２のマスク情報生成部１０７は、入退出情報または人物７０の画像認識結果に基づいて、地図情報の生成の対象から除外する領域を表すマスク情報を生成する（Ｓ２２）。 Next, the mask information generation unit 107 of the conversion unit 1102 of the present embodiment generates mask information representing an area to be excluded from the target of map information generation based on the entry / exit information or the image recognition result of the person 70 ( S22).

　そして、本実施形態のＳＬＡＭ処理部１１２０のトラッキング部１１０３は、撮像画像に基づいて、撮像装置１７の現在の位置および姿勢を特定する（Ｓ４）。この際、トラッキング部１１０３は、マスク情報に相当する領域については、トラッキング処理の対象外とする。 Then, the tracking unit 1103 of the SLAM processing unit 1120 of the present embodiment identifies the current position and orientation of the image pickup device 17 based on the captured image (S4). At this time, the tracking unit 1103 excludes the area corresponding to the mask information from the tracking process.

　そして、本実施形態の変換部１１０２の初期値生成部１０６は、トラッキング部１０３によって特定された撮像装置１７の現在の位置および姿勢に基づいて、ＢＩＭ情報から、撮像装置１７の周囲の構造物における点の３次元座標の初期値を生成する（Ｓ５）。なお、マスク情報に相当する領域については、地図情報の生成対象外であるため、初期値生成部１０６は、マスク情報に相当する領域内の構造物における点の３次元座標の初期値は生成しない。 Then, the initial value generation unit 106 of the conversion unit 1102 of the present embodiment is based on the current position and orientation of the image pickup device 17 specified by the tracking unit 103, and is based on the BIM information in the structure around the image pickup device 17. The initial value of the three-dimensional coordinates of the point is generated (S5). Since the area corresponding to the mask information is not subject to the generation of map information, the initial value generation unit 106 does not generate the initial value of the three-dimensional coordinates of the points in the structure in the area corresponding to the mask information. ..

　次に、バンドル調整部１１０４は、バンドル調整処理を実行する（Ｓ６）。バンドル調整部１１０４は、マスク情報に相当する領域については、バンドル調整の対象外とする。 Next, the bundle adjustment unit 1104 executes the bundle adjustment process (S6). The bundle adjustment unit 1104 excludes the area corresponding to the mask information from the bundle adjustment.

　Ｓ７の情報処理装置１の移動を終了するか否かの判定処理は、第１の実施形態と同様である。本実施形態においては、移動制御部１０５が移動を終了すると判定しない場合（Ｓ７“Ｎｏ”）、取得部１１０１は、最新の入退出情報取得を再度取得し（Ｓ２３）、Ｓ３の処理に戻る。 The process of determining whether or not to end the movement of the information processing device 1 in S7 is the same as that of the first embodiment. In the present embodiment, when the movement control unit 105 does not determine that the movement is completed (S7 “No”), the acquisition unit 1101 acquires the latest entry / exit information acquisition again (S23), and returns to the process of S3.

　また、移動制御部１０５が移動を終了すると判定した場合（Ｓ７“Ｙｅｓ”）、このフローチャートの処理は終了する。 Further, when the movement control unit 105 determines that the movement is completed (S7 “Yes”), the processing of this flowchart ends.

　このように、本実施形態の情報処理装置１は、環境情報に基づいて、建物９において地図情報の生成の対象から除外する領域を表すマスク情報を生成し、マスク情報に相当する領域については、地図情報を生成しない。このため、本実施形態の情報処理装置１によれば、一時的に存在する人物７０や物体等、地図情報の精度を低下させる可能性のあるものを除外することができるため、地図情報の精度を向上させることができる。 As described above, the information processing device 1 of the present embodiment generates mask information representing the area excluded from the target of map information generation in the building 9 based on the environmental information, and the area corresponding to the mask information is the area corresponding to the mask information. Does not generate map information. Therefore, according to the information processing device 1 of the present embodiment, it is possible to exclude things such as a person 70 and an object that temporarily exist, which may reduce the accuracy of the map information, so that the accuracy of the map information can be reduced. Can be improved.

　例えば、本実施形態においては、環境情報は、建物９における人物７０の入退出情報、または情報処理装置１に搭載された撮像装置１７によって撮像された撮像画像における人物７０の画像認識結果を含み、本実施形態の情報処理装置１は、入退出情報または人物７０の画像認識結果に基づいて、建物９において人物７０が位置する領域を判定し、判定した該領域を、地図情報の生成の対象から除外する領域とする。 For example, in the present embodiment, the environmental information includes the entry / exit information of the person 70 in the building 9, or the image recognition result of the person 70 in the image captured by the image pickup device 17 mounted on the information processing device 1. The information processing device 1 of the present embodiment determines an area where the person 70 is located in the building 9 based on the entry / exit information or the image recognition result of the person 70, and determines the determined area from the target of generating map information. The area to be excluded.

　例えば、建物９が作業現場等である場合、建物９の中には作業者等の人物７０が存在する場合がある。このような場合、情報処理装置１は、作業者を地図情報に反映しないことにより、地図情報の精度を向上させることができる。また、このような構成により、本実施形態の情報処理装置１は、周囲の環境が人物等によって変化する場合においてもロバストに処理を実行することができる。 For example, when the building 9 is a work site or the like, a person 70 such as a worker may exist in the building 9. In such a case, the information processing device 1 can improve the accuracy of the map information by not reflecting the worker in the map information. Further, with such a configuration, the information processing apparatus 1 of the present embodiment can robustly execute processing even when the surrounding environment changes depending on a person or the like.

　なお、本実施形態においては、第１の実施形態と同様に、環境情報に基づいてバンドル調整処理とするものとしたが、第２の実施形態の情報処理装置１は、第１の実施形態の機能を全て備えなくともよい。例えば、情報処理装置１は、環境情報をマスク情報の生成のためだけに使用し、バンドル調整処理には使用しないものとしてもよい。当該構成を採用する場合、環境情報は、３次元設計情報を含まなくともよい。 In the present embodiment, the bundle adjustment process is performed based on the environmental information as in the first embodiment, but the information processing device 1 of the second embodiment is the same as that of the first embodiment. It does not have to have all the functions. For example, the information processing device 1 may use the environmental information only for generating the mask information and may not use it for the bundle adjustment process. When the configuration is adopted, the environmental information does not have to include the three-dimensional design information.

　また、本実施形態においては、情報処理装置１が移動しながらリアルタイムに地図情報を生成する際に、マスク情報を利用する例を説明したが、マスク情報の利用のタイミングはこれに限定されるものではない。例えば、情報処理装置１によって生成された地図情報を、後から更新する際に、過去の時刻における入退出情報等に基づくマスク情報が利用されてもよい。 Further, in the present embodiment, an example in which the mask information is used when the information processing device 1 is moving to generate map information in real time has been described, but the timing of using the mask information is limited to this. is not it. For example, when updating the map information generated by the information processing device 1 later, mask information based on entry / exit information at a past time may be used.

（第３の実施形態）
　上述の第１、第２の実施形態においては、ＢＩＭ情報等の３次元設計情報における３次元座標系とＳＬＡＭ座標系とは予め対応関係が定義されているものとした。この第３の実施形態では、情報処理装置１が移動している途中で、３次元設計情報における３次元座標系とＳＬＡＭ座標系との対応関係を調整する。 (Third Embodiment)
In the first and second embodiments described above, it is assumed that the correspondence relationship between the three-dimensional coordinate system and the SLAM coordinate system in the three-dimensional design information such as BIM information is defined in advance. In this third embodiment, the correspondence between the three-dimensional coordinate system and the SLAM coordinate system in the three-dimensional design information is adjusted while the information processing device 1 is moving.

　図１０は、第３の実施形態に係る情報処理装置１が備える機能の一例を示すブロック図である。図１０に示すように、本実施形態の情報処理装置１は、取得部１１０１と、マーカ検出部１０８と、キャリブレーション部１０９と、変換部２１０２と、ＳＬＡＭ処理部２１２０と、移動制御部１０５とを備える。また、ＳＬＡＭ処理部２１２０は、トラッキング部２１０３と、バンドル調整部１１０４とを含む。また、変換部２１０２は、初期値生成部１１０６と、マスク情報生成部１１０７とを含む。 FIG. 10 is a block diagram showing an example of the functions included in the information processing device 1 according to the third embodiment. As shown in FIG. 10, the information processing apparatus 1 of the present embodiment includes an acquisition unit 1101, a marker detection unit 108, a calibration unit 109, a conversion unit 2102, a SLAM processing unit 2120, and a movement control unit 105. To be equipped. Further, the SLAM processing unit 2120 includes a tracking unit 2103 and a bundle adjustment unit 1104. Further, the conversion unit 2102 includes an initial value generation unit 1106 and a mask information generation unit 1107.

　移動制御部１０５は、第１、第２の実施形態と同様の機能を備える。取得部１１０１は、第２の実施形態と同様の機能を備える。 The movement control unit 105 has the same functions as those of the first and second embodiments. The acquisition unit 1101 has the same function as that of the second embodiment.

　マーカ検出部１０８は、撮像画像から、ＡＲマーカを検出する。ＡＲマーカは、例えば当該ＡＲマーカが記載された位置を表す３次元座標の情報をもつ。当該３次元座標は、ＢＩＭ情報における座標系と整合性がとれている。例えば、ＡＲマーカは、ＢＩＭ情報における座標系で、当該ＡＲマーカが設置された位置を表す。 The marker detection unit 108 detects the AR marker from the captured image. The AR marker has, for example, information on three-dimensional coordinates representing the position where the AR marker is described. The three-dimensional coordinates are consistent with the coordinate system in the BIM information. For example, the AR marker is a coordinate system in BIM information and represents a position where the AR marker is installed.

　ＡＲマーカは、本実施形態における指標情報の一例である。ＡＲマーカは、建物９の通路に沿った壁や柱等に設置されているものとする。ＡＲマーカは、具体的には、例えばＱＲコード（登録商標）等であるが、これに限定されるものではない。また、ＡＲマーカの数は特に限定されるものではないが、１つの建物９あたり複数のＡＲマーカが設置されているものとする。また、マーカ検出部１０８は、本実施形態における指標検出部の一例である。マーカ検出部１０８は、ＡＲマーカの検出結果を、キャリブレーション部１０９に送出する。 The AR marker is an example of index information in this embodiment. It is assumed that the AR marker is installed on a wall, a pillar, or the like along the passage of the building 9. Specifically, the AR marker is, for example, a QR code (registered trademark) or the like, but is not limited thereto. The number of AR markers is not particularly limited, but it is assumed that a plurality of AR markers are installed per building 9. Further, the marker detection unit 108 is an example of the index detection unit in the present embodiment. The marker detection unit 108 sends the detection result of the AR marker to the calibration unit 109.

　キャリブレーション部１０９は、ＡＲマーカの検出結果に基づいて、内部的に保持している自己位置を表す座標系を、ＢＩＭ情報の座標系と整合するように調整する。キャリブレーション部１０９は、本実施形態における座標調整部の一例である。 Based on the detection result of the AR marker, the calibration unit 109 adjusts the coordinate system representing the self-position held internally so as to match the coordinate system of the BIM information. The calibration unit 109 is an example of the coordinate adjustment unit in this embodiment.

　例えば、ＳＬＡＭ処理部２１２０によって推定された自己位置の変化の軌跡が、補助記憶装置１４に保存されるが、情報処理装置１の移動に伴って、自己位置の誤差が蓄積される場合がある。このような場合、推定された自己位置と、ＢＩＭ情報における建物９の立体モデル上の位置との対応関係に差異が生じる。キャリブレーション部１０９は、マーカ検出部１０８によって検出されたＡＲマーカが表す３次元座標に基づいて、現在の情報処理装置１の位置を調整することにより、このような誤差の蓄積を解消する。 For example, the locus of change in self-position estimated by the SLAM processing unit 2120 is stored in the auxiliary storage device 14, but an error in self-position may be accumulated as the information processing device 1 moves. In such a case, there is a difference in the correspondence between the estimated self-position and the position of the building 9 on the three-dimensional model in the BIM information. The calibration unit 109 eliminates the accumulation of such errors by adjusting the current position of the information processing device 1 based on the three-dimensional coordinates represented by the AR marker detected by the marker detection unit 108.

　キャリブレーション部１０９は、キャリブレーション結果を変換部２１０２に送出する。例えば、キャリブレーション部１０９は、自己位置を補正するための変換行列を、変換部２１０２に送出する。 The calibration unit 109 sends the calibration result to the conversion unit 2102. For example, the calibration unit 109 sends a conversion matrix for correcting the self-position to the conversion unit 2102.

　また、本実施形態の変換部２１０２は第１、第２の実施形態と同様の機能を備えた上で、キャリブレーション部１０９によって調整された自己位置に基づいて、環境情報を、ＳＬＡＭ処理部２１２０による自己位置の推定処理、または地図情報の生成処理の入力値に変換する。 Further, the conversion unit 2102 of the present embodiment has the same functions as those of the first and second embodiments, and also provides environmental information to the SLAM processing unit 2120 based on the self-position adjusted by the calibration unit 109. Converts to the input value of the self-position estimation process or the map information generation process.

　より詳細には、初期値生成部１１０６は、第２の実施形態と同様の機能を備えた上で、キャリブレーション部１０９によって調整された自己位置に基づいて、トラッキング部２１０３によって特定された撮像装置１７の位置と、ＢＩＭ情報における撮像装置１７の位置とを位置合わせし、位置合わせ後の撮像装置の位置および姿勢に基づいてＢＩＭ情報から、バンドル調整処理の初期値または初期値の範囲を表す入力値を生成する。 More specifically, the initial value generation unit 1106 has the same function as that of the second embodiment, and is an imaging device specified by the tracking unit 2103 based on the self-position adjusted by the calibration unit 109. An input that aligns the position of 17 and the position of the image pickup device 17 in the BIM information, and represents the initial value or the range of the initial value of the bundle adjustment process from the BIM information based on the position and orientation of the image pickup device after the alignment. Generate a value.

　例えば、初期値生成部１１０６は、キャリブレーション部１０９によって生成された変換行列によって、情報処理装置１の建物９における位置を示す３次元座標を、ＢＩＭ情報における建物９の３次元モデル上で特定した上で、バンドル調整処理の初期値または初期値の範囲を表す入力値を生成する。 For example, the initial value generation unit 1106 specified the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the transformation matrix generated by the calibration unit 109. Above, the initial value of the bundle adjustment process or the input value representing the range of the initial value is generated.

　また、マスク情報生成部１１０７は、第２の実施形態と同様の機能を備えた上で、キャリブレーション部１０９によって調整された自己位置に基づいて、マスク情報を生成する。 Further, the mask information generation unit 1107 has the same function as that of the second embodiment, and generates mask information based on the self-position adjusted by the calibration unit 109.

　例えば、マスク情報生成部１１０７は、キャリブレーション部１０９によって生成された変換行列によって、情報処理装置１の建物９における位置を示す３次元座標を、ＢＩＭ情報における建物９の３次元モデル上で特定した上で、マスク情報を生成する。 For example, the mask information generation unit 1107 identifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the conversion matrix generated by the calibration unit 109. Above, generate mask information.

　ＳＬＡＭ処理部２１２０のバンドル調整部１１０４は、第１、第２の実施形態と同様の機能を備えた上で、変換部２１０２がキャリブレーション部１０９によって調整された自己位置に基づいて生成したバンドル調整処理の初期値または初期値の範囲を、バンドル調整に使用する。 The bundle adjustment unit 1104 of the SLAM processing unit 2120 has the same functions as those of the first and second embodiments, and the bundle adjustment unit 2102 generates the bundle adjustment based on the self-position adjusted by the calibration unit 109. Use the initial value of processing or the range of initial values for bundle adjustment.

　図１１は、第３の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。 FIG. 11 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the third embodiment.

　Ｓ１のＢＩＭ情報の取得の処理から、Ｓ３の撮像画像およびセンシング結果の取得の処理までは、第２の実施形態と同様である。 The process from the process of acquiring the BIM information of S1 to the process of acquiring the captured image and the sensing result of S3 is the same as that of the second embodiment.

　マーカ検出部１０８は、撮像画像から、ＡＲマーカを検出する（Ｓ３１）。マーカ検出部１０８は、ＡＲマーカの検出結果を、キャリブレーション部１０９に送出する。 The marker detection unit 108 detects the AR marker from the captured image (S31). The marker detection unit 108 sends the detection result of the AR marker to the calibration unit 109.

　キャリブレーション部１０９は、ＡＲマーカの検出結果に基づいて、キャリブレーション処理を実行する（Ｓ３２）。例えば、キャリブレーション部１０９は、ＢＩＭ情報の座標系における自己位置を調整するための変換行列を生成する。キャリブレーション部１０９は、生成した変換行列を変換部２１０２に送出する。 The calibration unit 109 executes the calibration process based on the detection result of the AR marker (S32). For example, the calibration unit 109 generates a transformation matrix for adjusting the self-position of the BIM information in the coordinate system. The calibration unit 109 sends the generated transformation matrix to the conversion unit 2102.

　マスク情報生成部１１０７は、キャリブレーション部１０９によって生成された変換行列によって、情報処理装置１の建物９における位置を示す３次元座標を、ＢＩＭ情報における建物９の３次元モデル上で特定した上で、マスク情報を生成する（Ｓ２２）。 The mask information generation unit 1107 identifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the conversion matrix generated by the calibration unit 109. , Generates mask information (S22).

　Ｓ４のトラッキング処理は、第１、第２の実施形態と同様であるが、当該処理においても、キャリブレーション部１０９によるキャリブレーション結果を用いてもよい。 The tracking process of S4 is the same as that of the first and second embodiments, but the calibration result by the calibration unit 109 may also be used in the process.

　例えば、本実施形態において、キャリブレーション部１０９は、キャリブレーション結果を変換部２１０２に送出するとしたが、さらに、ＳＬＡＭ処理部２１２０にキャリブレーション結果を送出してもよい。当該構成を採用する場合、ＳＬＡＭ処理部２１２０のトラッキング部２１０３は、キャリブレーション結果に基づく３次元座標を用いて、トラッキング処理を実行する。 For example, in the present embodiment, the calibration unit 109 sends the calibration result to the conversion unit 2102, but may further send the calibration result to the SLAM processing unit 2120. When adopting this configuration, the tracking unit 2103 of the SLAM processing unit 2120 executes the tracking process using the three-dimensional coordinates based on the calibration result.

　そして、初期値生成部１１０６は、キャリブレーション部１０９によって生成された変換行列によって、情報処理装置１の建物９における位置を示す３次元座標を、ＢＩＭ情報における建物９の３次元モデル上で特定した上で、バンドル調整処理の初期値または初期値の範囲を表す入力値を生成する（Ｓ５）。 Then, the initial value generation unit 1106 specifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the transformation matrix generated by the calibration unit 109. Above, the initial value of the bundle adjustment process or the input value representing the range of the initial value is generated (S5).

　Ｓ６のバンドル調整処理においても、バンドル調整部１１０４とは、キャリブレーション結果に基づく３次元座標を用いて、トラッキング処理およびバンドル調整処理を実行してもよい。 Also in the bundle adjustment process of S6, the bundle adjustment unit 1104 may execute the tracking process and the bundle adjustment process using the three-dimensional coordinates based on the calibration result.

　Ｓ７の情報処理装置１の移動を終了するか否かの判定処理と、Ｓ２３の入退出情報の取得の処理は、第２の実施形態と同様である。 The process of determining whether or not to end the movement of the information processing device 1 in S7 and the process of acquiring the entry / exit information in S23 are the same as those in the second embodiment.

　このように、本実施形態の情報処理装置１は、撮像画像等の検知結果から、ＢＩＭ情報における座標系で位置が表された指標情報を検出し、当該指標情報によって調整された座標系に基づいて、環境情報を、ＳＬＡＭ処理部２１２０による自己位置の推定処理、または地図情報の生成処理の入力値に変換する。このため、本実施形態の情報処理装置１によれば、ＢＩＭ情報と、情報処理装置１の内部的なＳＬＡＭ座標系との誤差を低減し、より高精度に自己位置推定および地図情報の生成をすることができる。 As described above, the information processing apparatus 1 of the present embodiment detects the index information whose position is represented by the coordinate system in the BIM information from the detection result of the captured image or the like, and is based on the coordinate system adjusted by the index information. Then, the environmental information is converted into an input value of the self-position estimation process by the SLAM processing unit 2120 or the map information generation process. Therefore, according to the information processing device 1 of the present embodiment, the error between the BIM information and the internal SLAM coordinate system of the information processing device 1 is reduced, and self-position estimation and map information generation can be performed with higher accuracy. can do.

　なお、本実施形態では、指標情報としてＡＲマーカを例示したが、指標情報はこれに限定されるものではない。例えば、指標情報は、Ｌｉｄａｒまたは各種センサで捕捉可能な標識等でもよいし、ビーコン等であってもよい。 In the present embodiment, the AR marker is illustrated as the index information, but the index information is not limited to this. For example, the index information may be a sign or the like that can be captured by Lidar or various sensors, or may be a beacon or the like.

　また、本実施形態においては、情報処理装置１が第１の実施形態と第２の実施形態の両方の機能を備えるものとして記載したが、本実施形態の情報処理装置１は、第１、第２の実施形態の機能を全て備えなくともよい。例えば、情報処理装置１は、環境情報をバンドル調整またはマスク情報の生成のいずれか一方のためだけに使用してもよい。また、環境情報は、３次元設計情報、入退出情報、または人物の画像認識結果のいずれかを含むものであればよい。 Further, in the present embodiment, the information processing device 1 is described as having the functions of both the first embodiment and the second embodiment, but the information processing device 1 of the present embodiment is described as having the functions of both the first embodiment and the second embodiment. It is not necessary to have all the functions of the second embodiment. For example, the information processing apparatus 1 may use the environmental information only for either bundle adjustment or generation of mask information. Further, the environmental information may include any of three-dimensional design information, entry / exit information, and image recognition result of a person.

（第４の実施形態）
　上述の第２の実施形態では、情報処理装置１は、撮像画像を人物の認識に使用していたが、撮像画像の用途はこれに限定されるものではない。この第４の実施形態では、情報処理装置１は、撮像画像に描出された物体の認識結果に基づいて、撮像画像をセグメンテーション（領域分割）し、セグメンテーション結果に基づいてＳＬＡＭ処理を行う。 (Fourth Embodiment)
In the second embodiment described above, the information processing apparatus 1 uses the captured image for recognizing a person, but the use of the captured image is not limited to this. In this fourth embodiment, the information processing apparatus 1 segmentes the captured image based on the recognition result of the object drawn on the captured image, and performs SLAM processing based on the segmentation result.

　本実施形態の情報処理装置１は、取得部１０１と、変換部１０２と、ＳＬＡＭ処理部１２０と、移動制御部１０５とを備える。 The information processing device 1 of the present embodiment includes an acquisition unit 101, a conversion unit 102, a SLAM processing unit 120, and a movement control unit 105.

　取得部１０１は、第１の実施形態と同様の機能を備える。具体的には、取得部１０１は、デバイスインタフェース１５を介して、撮像装置１７から撮像画像を取得する。 The acquisition unit 101 has the same function as that of the first embodiment. Specifically, the acquisition unit 101 acquires an captured image from the imaging device 17 via the device interface 15.

　変換部１０２は、第１の実施形態と同様の機能を備えた上で、取得部１０１によって取得された撮像画像を、撮像画像に描出された物体の認識結果に基づいてセグメンテーションする。 The conversion unit 102 has the same function as that of the first embodiment, and then segments the captured image acquired by the acquisition unit 101 based on the recognition result of the object drawn on the captured image.

　図１２は、第４の実施形態に係る撮像画像６０のセグメンテーションの一例を示す図である。図１２の左側に示すように、撮像画像６０には、情報処理装置１の周囲の環境が描出される。変換部１０２は、撮像画像６０から、物体が描出された画像領域と、各物体の種別とを認識する。本実施形態においては、物体の認識結果は、物体が描出された画像領域の２次元座標と、各物体の種別とが対応付けられた情報とする。 FIG. 12 is a diagram showing an example of segmentation of the captured image 60 according to the fourth embodiment. As shown on the left side of FIG. 12, the captured image 60 depicts the environment around the information processing device 1. The conversion unit 102 recognizes the image area in which the object is drawn and the type of each object from the captured image 60. In the present embodiment, the recognition result of the object is information in which the two-dimensional coordinates of the image area in which the object is drawn and the type of each object are associated with each other.

　本実施形態においては、環境情報は、少なくとも撮像画像６０を含むものとする。あるいは、撮像画像６０自体ではなく、撮像画像６０のセグメンテーション結果を、環境情報の一例としても良い。 In the present embodiment, the environmental information includes at least the captured image 60. Alternatively, the segmentation result of the captured image 60 may be used as an example of the environmental information instead of the captured image 60 itself.

　変換部１０２は、撮像画像６０に描出された物体を認識する。なお、第１の実施形態と同様に、本実施形態においても、「物体」という場合は、壁や柱等の構造物、什器、家具、移動体、仮設物、および人物等を含むものとする。 The conversion unit 102 recognizes the object depicted in the captured image 60. As in the first embodiment, in the present embodiment as well, the term "object" includes structures such as walls and pillars, furniture, furniture, moving objects, temporary objects, people, and the like.

　変換部１０２は、例えば、ニューラルネットワーク等によって構成された学習済みモデルに撮像画像６０を入力することにより、撮像画像６０に描出された個々の物体を認識する。図１２に示す例では、撮像画像６０には、人物７０と、箱７５ａ，７５ｂと、柱９０と、壁９１と、床９２とが描出されている。変換部１０２は、これらの物体を認識する。“人物”、“箱”、“柱”、“壁”、および“床”は、物体の種別の一例である。なお、人物７０の認識と、その他の物体の認識とは、別々に実行されても良い。 The conversion unit 102 recognizes the individual objects drawn on the captured image 60 by inputting the captured image 60 into the trained model configured by, for example, a neural network or the like. In the example shown in FIG. 12, the captured image 60 depicts a person 70, boxes 75a and 75b, a pillar 90, a wall 91, and a floor 92. The conversion unit 102 recognizes these objects. "People," "boxes," "pillars," "walls," and "floors" are examples of object types. The recognition of the person 70 and the recognition of other objects may be executed separately.

　変換部１０２は、物体の認識結果に基づいて、撮像画像６０をセグメンテーションする。図１２の右側には、撮像画像６０のセグメンテーション結果６１を示す。図１２に示す例では、変換部１０２は、柱９０および壁９１が描出された画像領域を領域Ａ１、床９２が描出された画像領域を領域Ａ２、箱７５ａ，７５ｂが描出された画像領域を領域Ａ３、人物７０が描出された画像領域を領域Ａ４として、撮像画像６０を４つにセグメンテーションする。なお、分割単位は図１２に示す例に限定されるものではない。以下、領域Ａ１～Ａ４を特に区別しない場合には、単に領域Ａという。 The conversion unit 102 segmentes the captured image 60 based on the recognition result of the object. On the right side of FIG. 12, the segmentation result 61 of the captured image 60 is shown. In the example shown in FIG. 12, the conversion unit 102 sets the image area where the pillar 90 and the wall 91 are drawn as the area A1, the image area where the floor 92 is drawn, the area A2, and the image area where the boxes 75a and 75b are drawn. The image region in which the area A3 and the person 70 are drawn is set as the area A4, and the captured image 60 is segmented into four. The division unit is not limited to the example shown in FIG. Hereinafter, when the regions A1 to A4 are not particularly distinguished, they are simply referred to as regions A.

　また、変換部１０２は、認識した物体、つまり、人物７０、箱７５ａ，７５ｂ、柱９０、壁９１、および床９２を、常設された物体か否かによって分類する。例えば、柱９０、壁９１、および床９２は建物９の一部であるため、常設された物体である。また、人物７０、および箱７５ａ，７５ｂは、常設されていない物体である。各物体が常設されているか否かは、例えば、学習済みモデルによって判別される。 Further, the conversion unit 102 classifies the recognized objects, that is, the person 70, the boxes 75a and 75b, the pillar 90, the wall 91, and the floor 92 according to whether or not they are permanent objects. For example, the pillar 90, the wall 91, and the floor 92 are permanent objects because they are part of the building 9. The person 70 and the boxes 75a and 75b are non-permanent objects. Whether or not each object is permanently installed is determined by, for example, a trained model.

　常設された物体とは、一度設置されると、設置位置から移動しない物体である。例えば、上記の柱９０、壁９１、および床９２のように、建物９の一部である物体は、基本的に移動しないため、常設された物体とする。また、常設されていない物体とは、設置位置から移動する可能性の高い物体である。例えば、人物７０や、カートやフォークリフト等の移動体、一時的に設置された什器、および荷物の箱７５ａ，７５ｂ等は、常設されていない物体とする。 A permanently installed object is an object that does not move from the installation position once it is installed. For example, an object that is a part of the building 9, such as the pillar 90, the wall 91, and the floor 92, is basically a permanent object because it does not move. An object that is not permanently installed is an object that is likely to move from the installation position. For example, a person 70, a moving body such as a cart or a forklift, temporarily installed fixtures, luggage boxes 75a, 75b, etc. are non-permanent objects.

　また、変換部１０２は、セグメンテーションした領域Ａ１～Ａ４と、各領域に描出された物体が常設された物体か否かを対応付ける。本実施形態においては、セグメンテーションした領域Ａ１～Ａ４と、各領域に描出された物体が常設された物体か否かを対応付けた情報を、セグメンテーション結果という。変換部１０２は、セグメンテーション結果を、ＳＬＡＭ処理部１２０に送出する。 Further, the conversion unit 102 associates the segmented areas A1 to A4 with whether or not the object drawn in each area is a permanently installed object. In the present embodiment, the information in which the segmented areas A1 to A4 are associated with whether or not the object drawn in each area is a permanently installed object is referred to as a segmentation result. The conversion unit 102 sends the segmentation result to the SLAM processing unit 120.

　なお、上記では、物体認識と、該物体認識の結果に基づくセグメンテーションの処理とを分けて説明したが、これらの処理は統合されても良い。例えば、撮像画像６０の入力を受けた場合に該撮像画像６０のセグメンテーション結果を出力する学習済みモデルを採用しても良い。この場合、変換部１０２は、学習済みモデルに撮像画像６０を入力し、該学習済みモデルから出力されるセグメンテーション結果を得る。 In the above, the object recognition and the segmentation process based on the result of the object recognition have been described separately, but these processes may be integrated. For example, a trained model that outputs the segmentation result of the captured image 60 when the captured image 60 is input may be adopted. In this case, the conversion unit 102 inputs the captured image 60 into the trained model and obtains the segmentation result output from the trained model.

　また、撮像画像６０から物体認識、および撮像画像６０のセグメンテーションの手法は、上記の例に限定されるものではない。例えば、変換部１０２は、ニューラルネットワーク以外の他の機械学習または深層学習の技術を適用して、撮像画像６０からの物体認識、および撮像画像６０のセグメンテーションを実行しても良い。 Further, the method of object recognition from the captured image 60 and the segmentation of the captured image 60 is not limited to the above example. For example, the conversion unit 102 may apply a machine learning or deep learning technique other than the neural network to perform object recognition from the captured image 60 and segmentation of the captured image 60.

　また、本実施形態のＳＬＡＭ処理部１２０は、第１の実施形態の機能を備えた上で、撮像画像６０のセグメンテーション結果に基づいて、自己位置の推定と地図情報の生成とを実行する。 Further, the SLAM processing unit 120 of the present embodiment has the functions of the first embodiment, and then estimates the self-position and generates map information based on the segmentation result of the captured image 60.

　例えば、ＳＬＡＭ処理部１２０は、撮像画像６０のうち、常設された物体が描出された領域Ａ１，Ａ２に相当する３次元空間を特定し、該３次元空間を自己位置の推定処理および地図情報の生成処理の対象とする。また、ＳＬＡＭ処理部１２０は、撮像画像６０のうち、常設されていない物体が描出された領域Ａ３，Ａ４に相当する３次元空間を特定し、該３次元空間を自己位置の推定処理および地図情報の生成処理の対象から除外する。この場合、常設されていない物体が描出された領域Ａ３，Ａ４を表す情報を、地図情報の生成の対象から除外する領域を表すマスク情報としてもよい。 For example, the SLAM processing unit 120 identifies a three-dimensional space corresponding to the areas A1 and A2 in which the permanent object is drawn in the captured image 60, and uses the three-dimensional space for self-position estimation processing and map information. Target of generation processing. Further, the SLAM processing unit 120 identifies a three-dimensional space corresponding to the areas A3 and A4 in which a non-permanent object is drawn in the captured image 60, and estimates the self-position and map information in the three-dimensional space. Exclude from the target of the generation process of. In this case, the information representing the areas A3 and A4 in which the non-permanent object is drawn may be used as the mask information representing the area to be excluded from the target of generating the map information.

　あるいは、ＳＬＡＭ処理部１２０は、撮像画像６０のうち、常設された物体が描出された領域Ａ１，Ａ２をＳＬＡＭ処理に使用し、撮像画像６０のうち、常設されていない物体が描出された領域Ａ３，Ａ４をＳＬＡＭ処理に使用しないものとしても良い。 Alternatively, the SLAM processing unit 120 uses the regions A1 and A2 in which the permanent object is drawn in the captured image 60 for SLAM processing, and the region A3 in which the non-permanent object is drawn in the captured image 60. , A4 may not be used for SLAM processing.

　また、各領域Ａ１～Ａ４に描出された物体が常設された物体であるか否かに応じて、ＳＬＡＭ処理の際の重み付けを変更しても良い。例えば、ＳＬＡＭ処理部１２０は、撮像画像６０のうち、常設された物体が描出された領域Ａ１，Ａ２の重み係数が、常設されていない物体が描出された領域Ａ３，Ａ４の重み係数よりも大きくなるように、各領域Ａ１～Ａ４に重み係数を設定する。これにより、自己位置の推定と地図情報の生成において、撮像画像６０のうち、常設された物体が描出された領域Ａ１，Ａ２の影響が、常設されていない物体が描出された領域Ａ３，Ａ４よりも大きくなる。 Further, the weighting at the time of SLAM processing may be changed depending on whether or not the objects drawn in each of the areas A1 to A4 are permanent objects. For example, in the SLAM processing unit 120, the weighting coefficient of the regions A1 and A2 in which the permanent object is drawn is larger than the weighting coefficient of the regions A3 and A4 in which the non-permanent object is drawn in the captured image 60. The weighting coefficients are set in each of the regions A1 to A4 so as to be. As a result, in the estimation of the self-position and the generation of the map information, the influence of the areas A1 and A2 in which the permanent object is drawn is affected by the areas A3 and A4 in which the non-permanent object is drawn. Will also grow.

　また、ＳＬＡＭ処理部１２０は、常設されていない物体が描出された領域Ａにおいて、物体の種別ごとに、重み係数を変更しても良い。例えば、常設されていない物体であっても、長期間に渡って同一の位置に設置される可能性が比較的高い物体と、該可能性が比較的低い物体とがある。ＳＬＡＭ処理部１２０は、例えば、大型の什器や家具等は、移動する可能性があるため、常設されていない物体に分類されるが、人物７０等と比較すると長期間に渡って同一の位置に設置される可能性が高い。このため、ＳＬＡＭ処理部１２０は、常設されていない物体が描出された領域Ａのうち、移動する可能性が低い物体が描出された領域Ａほど重み係数が大きくなるように、重み係数を設定しても良い。 Further, the SLAM processing unit 120 may change the weighting coefficient for each type of object in the area A in which the non-permanent object is drawn. For example, there are an object that is relatively likely to be installed at the same position for a long period of time and an object that is relatively unlikely to be installed even if the object is not permanently installed. The SLAM processing unit 120 is classified as a non-permanent object because, for example, large furniture and furniture may move, but it stays in the same position for a long period of time as compared with the person 70 and the like. It is likely to be installed. Therefore, the SLAM processing unit 120 sets the weighting coefficient so that the weighting coefficient becomes larger as the area A in which the object having a low possibility of movement is drawn out of the area A in which the non-permanent object is drawn. May be.

　なお、各領域Ａ１～Ａ４への重み係数は、ＳＬＡＭ処理部１２０ではなく、変換部１０２が設定しても良い。 The weighting coefficient for each area A1 to A4 may be set by the conversion unit 102 instead of the SLAM processing unit 120.

　このように、本実施形態の情報処理装置１は、撮像装置１７によって撮像された撮像画像６０を、撮像画像６０に描出された物体の認識結果に基づいてセグメンテーションし、該セグメンテーションの結果に基づいて、自己位置の推定と地図情報の生成とを実行する。このため、本実施形態の情報処理装置１によれば、第１の実施形態の効果に加えて、撮像画像６０に描出された物体に応じて、自己位置の推定および地図情報の生成に用いるか否か、または自己位置の推定および地図情報の生成における影響の強さを調整することができるため、自己位置の推定の精度および地図情報の精度を向上させることができる。 As described above, the information processing device 1 of the present embodiment segments the captured image 60 captured by the imaging device 17 based on the recognition result of the object drawn on the captured image 60, and based on the result of the segmentation. , Performs self-position estimation and map information generation. Therefore, according to the information processing apparatus 1 of the present embodiment, in addition to the effect of the first embodiment, whether it is used for estimating the self-position and generating map information according to the object drawn on the captured image 60. Whether or not, or the strength of the influence on the estimation of the self-position and the generation of the map information can be adjusted, so that the accuracy of the estimation of the self-position and the accuracy of the map information can be improved.

　また、上述のＳＬＡＭ処理部１２０は、撮像画像６０のセグメンテーション結果と、ＢＩＭ情報等の３次元設計情報とに基づいて、自己位置の推定と地図情報の生成とを実行してもよい。 Further, the SLAM processing unit 120 described above may execute self-position estimation and map information generation based on the segmentation result of the captured image 60 and the three-dimensional design information such as BIM information.

　例えば、ＳＬＡＭ処理部１２０は、撮像画像６０に描出された物体が常設物ではないと変換部１０２が判定した場合に、３次元設計情報を参照し、当該物体が建物９の設計に含まれているか否かを判定する。ＳＬＡＭ処理部１２０は、撮像画像６０に基づいて常設物ではないと判定した物体が、３次元設計情報に登録されていない場合は、該物体が常設物ではないという判定結果をそのまま採用する。また、ＳＬＡＭ処理部１２０は、撮像画像６０に基づいて常設物ではないと判定した物体が、３次元設計情報に登録されている場合は、該物体が常設物ではないという判定結果を、該物体が常設物であるという判定結果に変更する。 For example, when the conversion unit 102 determines that the object depicted in the captured image 60 is not a permanent object, the SLAM processing unit 120 refers to the three-dimensional design information, and the object is included in the design of the building 9. Judge whether or not. If the object determined not to be a permanent object based on the captured image 60 is not registered in the three-dimensional design information, the SLAM processing unit 120 adopts the determination result that the object is not a permanent object as it is. Further, when the object determined to be not a permanent object based on the captured image 60 is registered in the three-dimensional design information, the SLAM processing unit 120 determines that the object is not a permanent object. Is changed to the judgment result that is a permanent object.

　また、変換部１０２は、撮像画像６０に描出された物体が常設物であるか否かの判定の精度の高さを、パーセント等で判定しても良い。例えば、ＳＬＡＭ処理部１２０は、撮像画像６０に描出された物体が常設物であるか否かの判定の精度が基準値以下である場合に、３次元設計情報を参照し、当該物体が建物９の設計に含まれているか否かを判定してもよい。なお、判定の精度の基準値は、特に限定されるものではない。 Further, the conversion unit 102 may determine the high accuracy of determining whether or not the object depicted in the captured image 60 is a permanent object by a percentage or the like. For example, the SLAM processing unit 120 refers to the three-dimensional design information when the accuracy of determining whether or not the object depicted in the captured image 60 is a permanent object is equal to or less than the reference value, and the object is the building 9 It may be determined whether or not it is included in the design of. The reference value of the determination accuracy is not particularly limited.

　なお、３次元設計情報と画像認識結果とを比較する処理は、ＳＬＡＭ処理部１２０ではなく、変換部１０２が実行しても良い。 Note that the process of comparing the three-dimensional design information with the image recognition result may be executed by the conversion unit 102 instead of the SLAM processing unit 120.

　このように、学習済みモデルによって撮像画像６０から認識された結果を、３次元設計情報と併用することにより、各領域Ａに描出された物体が常設物であるか否かの判定結果の精度を向上させることができる。 In this way, by using the result recognized from the captured image 60 by the trained model together with the three-dimensional design information, the accuracy of the determination result as to whether or not the object drawn in each region A is a permanent object can be determined. Can be improved.

　また、ＳＬＡＭ処理部１２０は、本実施形態における撮像画像６０のセグメンテーション結果と、上述の第２の実施形態における人物の入退出情報、または撮像画像における人物７０の画像認識結果のいずれか１つまたは両方を組み合わせて使用しても良い。 Further, the SLAM processing unit 120 is either one of the segmentation result of the captured image 60 in the present embodiment, the entry / exit information of the person in the second embodiment described above, or the image recognition result of the person 70 in the captured image. Both may be used in combination.

（変形例１）
　上述の第１の実施形態では、ＢＩＭ情報等の３次元設計情報に基づいて、ＳＬＡＭ処理における周囲の物体における点の３次元座標の初期値を求めていた。本変形例では、情報処理装置１の周囲が撮像された撮像画像６０から推定された距離値から算出された点の３次元座標を、ＳＬＡＭ処理における周囲の物体における点の３次元座標の初期値として採用する。 (Modification example 1)
In the first embodiment described above, the initial value of the three-dimensional coordinates of the points in the surrounding object in the SLAM process is obtained based on the three-dimensional design information such as BIM information. In this modification, the three-dimensional coordinates of the points calculated from the distance values estimated from the captured image 60 captured around the information processing device 1 are used as the initial values of the three-dimensional coordinates of the points in the surrounding objects in the SLAM process. Adopt as.

　例えば、情報処理装置１の変換部１０２は、撮像画像６０に基づいて、撮像画像６０に描出された物体と撮像装置１７との間の距離（深度）を推定する。該推定処理を、デプス推定処理という。 For example, the conversion unit 102 of the information processing device 1 estimates the distance (depth) between the object drawn on the captured image 60 and the imaging device 17 based on the captured image 60. The estimation process is called a depth estimation process.

　本変形例においては、環境情報は、少なくとも撮像画像６０を含むものとする。あるいは、撮像画像６０自体ではなく、撮像画像６０から推定した距離情報を、環境情報の一例としても良い。 In this modification, the environmental information includes at least the captured image 60. Alternatively, the distance information estimated from the captured image 60 may be used as an example of the environmental information instead of the captured image 60 itself.

　例えば、上述の第１の実施形態で説明したように、撮像装置１７がステレオカメラである場合は、変換部１０２は、該ステレオカメラに含まれる１つのカメラで撮像された撮像画像のステレオ視差から、深度を算出する。 For example, as described in the first embodiment described above, when the image pickup device 17 is a stereo camera, the conversion unit 102 is based on the stereoscopic difference of the captured image captured by one camera included in the stereo camera. , Calculate the depth.

　また、撮像装置１７は、単眼カメラでも良い。この場合、変換部１０２は、機械学習または深層学習の技術を使用して、デプス推定処理を実行する。例えば、変換部１０２は、単眼カメラで撮像された撮像画像６０を入力すると該撮像画像６０に対応する深度マップを出力する学習済みモデルにより、撮像画像６０に描出された物体と撮像装置１７との間の距離を推定してもよい。 Further, the imaging device 17 may be a monocular camera. In this case, the conversion unit 102 executes the depth estimation process by using the technique of machine learning or deep learning. For example, when the conversion unit 102 inputs the captured image 60 captured by the monocular camera, the conversion unit 102 outputs the depth map corresponding to the captured image 60. The distance between them may be estimated.

　本変形例における学習済みモデルは、例えば、ステレオ画像を学習データとして、単眼画像から該単眼画像と対になる画像を推定することにより、単眼画像から深度を推定するモデルとする。なお、単眼画像から深度を推定する手法はこれに限定されるものではない。 The trained model in this modification is, for example, a model that estimates the depth from a monocular image by estimating an image paired with the monocular image from the monocular image using a stereo image as training data. The method of estimating the depth from a monocular image is not limited to this.

　変換部１０２は、撮像画像６０から推定した距離を、ＳＬＡＭ処理の入力値としてＳＬＡＭ処理部１２０に送出する。より詳細には、変換部１０２は、撮像画像６０から推定した距離から推定される点の３次元座標をバンドル調整部１０４によるバンドル調整処理における初期値とする。なお、変換部１０２は、初期値を一意の値として特定せずに、初期値の範囲を特定してもよい。 The conversion unit 102 sends the distance estimated from the captured image 60 to the SLAM processing unit 120 as an input value for SLAM processing. More specifically, the conversion unit 102 sets the three-dimensional coordinates of the points estimated from the distance estimated from the captured image 60 as the initial values in the bundle adjustment process by the bundle adjustment unit 104. The conversion unit 102 may specify the range of the initial value without specifying the initial value as a unique value.

　なお、撮像装置１７が情報処理装置１の中心から離れた位置に設置されている場合には、変換部１０２またはＳＬＡＭ処理部１２０は、撮像画像６０から推定した距離を、撮像装置１７と情報処理装置１の中心との位置のずれに基づいて補正する。 When the image pickup device 17 is installed at a position away from the center of the information processing device 1, the conversion unit 102 or the SLAM processing unit 120 processes the distance estimated from the captured image 60 with the image pickup device 17. The correction is made based on the deviation of the position from the center of the device 1.

　本変形例によれば、撮像画像６０から推定した物体と撮像装置１７との間の距離を、ＳＡＬＭ処理の入力値として使用することにより、３次元設計情報がなくとも、バンドル調整処理等の演算量を低減することができる。 According to this modification, by using the distance between the object estimated from the captured image 60 and the imaging device 17 as the input value of the SALM process, the bundle adjustment process and the like can be calculated even if there is no three-dimensional design information. The amount can be reduced.

　また、変換部１０２は、撮像画像６０から推定した物体と撮像装置１７との間の距離と、３次元設計情報から算出した情報処理装置１と物体との距離との両方に基づいて、周囲の物体までの距離に関する入力値を生成してもよい。例えば、変換部１０２は、撮像画像６０から推定した距離と３次元設計情報から算出した距離の平均から求めた点の３次元座標を、バンドル調整処理における初期値としてもよい。 Further, the conversion unit 102 is based on both the distance between the object and the image pickup device 17 estimated from the captured image 60 and the distance between the information processing device 1 and the object calculated from the three-dimensional design information. You may generate an input value for the distance to the object. For example, the conversion unit 102 may use the three-dimensional coordinates of the point obtained from the average of the distance estimated from the captured image 60 and the distance calculated from the three-dimensional design information as the initial value in the bundle adjustment process.

（変形例２）
　上述の第１から第４の実施形態では、ＳＬＡＭ処理部１２０は、点群地図を生成するものとしたが、３次元表現の態様は、点群地図に限定されるものではない。 (Modification 2)
In the first to fourth embodiments described above, the SLAM processing unit 120 generates a point cloud map, but the mode of the three-dimensional representation is not limited to the point cloud map.

　例えば、ＳＬＡＭ処理部１２０，１１２０，２１２０（以下、代表してＳＬＡＭ処理部１２０と記載する）は、３次元座標をもつ複数の図形の集合を、地図情報として生成してもよい。 For example, the SLAM processing units 120, 1120, and 2120 (hereinafter, collectively referred to as the SLAM processing unit 120) may generate a set of a plurality of figures having three-dimensional coordinates as map information.

　図１３は、変形例２に係る地図情報の一例を示す図である。図１３に示す地図情報５００は、２次元の撮像画像４５に対して、複数の三角形の図形（三角パッチ、Triangular-patch-cloud）５０１ａ～５０１ｆ（以下、三角パッチ５０１という）が当てはめられたものである。 FIG. 13 is a diagram showing an example of map information according to the modified example 2. The map information 500 shown in FIG. 13 is obtained by applying a plurality of triangular figures (triangular-patch-cloud) 501a to 501f (hereinafter referred to as triangular patch 501) to the two-dimensional captured image 45. Is.

　個々の三角パッチ５０１は平面図形であるが、３次元空間上で位置および向きを変更可能であるものとする。三角パッチ５０１の向きは、法線ベクトルｎによって表される。また、三角パッチ５０１の位置は、３次元座標で表される。各三角パッチ５０１の位置および向きは、２次元の撮像画像４５の深度に対応する。 Each triangular patch 501 is a plane figure, but its position and orientation can be changed in three-dimensional space. The orientation of the triangular patch 501 is represented by the normal vector n. The position of the triangular patch 501 is represented by three-dimensional coordinates. The position and orientation of each triangular patch 501 corresponds to the depth of the two-dimensional captured image 45.

　ＳＬＡＭ処理部１２０は、撮像画像４５に当てはめた複数の三角パッチ５０１の中心点の位置と法線ベクトルとを最適化することにより、３次元の地図情報を生成する。 The SLAM processing unit 120 generates three-dimensional map information by optimizing the positions of the center points and the normal vectors of the plurality of triangular patches 501 applied to the captured image 45.

　本変形例の情報処理装置１は、このように三角パッチ５０１の集合として地図情報を生成することにより、３次元空間上の点の３次元座標を個々に算出する場合よりも、計算量を低減すると共に、周囲の環境を密に表現する地図情報を生成することができる。 By generating map information as a set of triangular patches 501 in this way, the information processing device 1 of this modification reduces the amount of calculation as compared with the case where the three-dimensional coordinates of points in the three-dimensional space are individually calculated. At the same time, it is possible to generate map information that closely expresses the surrounding environment.

　また、図１３では撮像画像４５に対して三角パッチ５０１が当てはめられるものとしたが、ＢＩＭ情報に対して、三角パッチ５０１が当てはめられてもよい。例えば、変換部１０２，１１０２，２１０２（以下、変換部１０２と記載する）は、ＢＩＭ情報等の３次元設計情報に、三角パッチ５０１を当てはめてもよい。例えば、変換部１０２は、撮像画像上でエッジとして描出される境界以外にも、ＢＩＭ情報に基づいて、３次元構造の境界に三角パッチ５０１の境界を定めることができる。 Further, in FIG. 13, the triangular patch 501 is applied to the captured image 45, but the triangular patch 501 may be applied to the BIM information. For example, the conversion unit 102, 1102, 2102 (hereinafter, referred to as the conversion unit 102) may apply the triangular patch 501 to the three-dimensional design information such as BIM information. For example, the conversion unit 102 can determine the boundary of the triangular patch 501 as the boundary of the three-dimensional structure based on the BIM information, in addition to the boundary drawn as an edge on the captured image.

　当該構成を採用する場合、ＳＬＡＭ処理部１２０は、変換部１０２によって当てはめられた複数の三角パッチ５０１の位置及び向きを、ＳＬＡＭ結果に基づいて補正することにより、より高精度な地図情報を生成することができる。 When adopting this configuration, the SLAM processing unit 120 generates more accurate map information by correcting the positions and orientations of the plurality of triangular patches 501 applied by the conversion unit 102 based on the SLAM result. be able to.

　なお、地図情報を構成する図形は、三角パッチ５０１に限定されるものではなく、ＳＬＡＭ処理部１２０は、メッシュ表現や、３次元のポリゴンによって地図情報を生成してもよい。 Note that the figures constituting the map information are not limited to the triangular patch 501, and the SLAM processing unit 120 may generate the map information by mesh representation or three-dimensional polygons.

（変形例３）
　上述の第１から第３の実施形態では、環境情報は、３次元設計情報、建物９における人物の入退出情報、または撮像画像における人物の画像認識結果としたが、環境情報はこれらに限定されるものではない。 (Modification example 3)
In the first to third embodiments described above, the environmental information is the three-dimensional design information, the entry / exit information of the person in the building 9, or the image recognition result of the person in the captured image, but the environmental information is limited to these. It's not something.

　例えば、本変形例においては、環境情報は、少なくとも周囲の照明または天気のいずれか１つに関する情報を含む。周囲の照明に関する情報は、例えば、建物９の部屋ごとまたはフロアごとの照明がオン状態であるか、オフ状態であるかを表す情報である。また、天気に関する情報は、建物９を含む地域における晴れ、曇り、雨等、日照条件に関する情報である。環境情報は、周囲の照明に関する情報と天気に関する情報の両方を含むものでもよいし、いずれか一方のみを含むものでもよい。 For example, in this modification, the environmental information includes information on at least one of the ambient lighting and the weather. The information regarding the ambient lighting is, for example, information indicating whether the lighting for each room or floor of the building 9 is on or off. The information on the weather is information on sunshine conditions such as sunny, cloudy, and rainy in the area including the building 9. The environmental information may include both information on ambient lighting and information on weather, or may include only one or the other.

　例えば、取得部１０１，１１０１（以下、取得部１０１と記載する）は、外部装置２から、周囲の照明または天気に関する情報を取得する。 For example, the acquisition units 101 and 1101 (hereinafter referred to as the acquisition unit 101) acquire information on ambient lighting or weather from the external device 2.

　また、変換部１０２は、取得部１０１によって取得された周囲の照明または天気に関する情報に基づいて、撮像画像が劣化する可能性が高い領域を表すマスク情報を生成する。なお、第２の実施形態のマスク情報を第１のマスク情報、本変形例のマスク情報を第２のマスク情報として区別してもよい。 Further, the conversion unit 102 generates mask information representing a region where the captured image is likely to be deteriorated, based on the information on the ambient lighting or the weather acquired by the acquisition unit 101. The mask information of the second embodiment may be distinguished as the first mask information, and the mask information of the present modification may be distinguished as the second mask information.

　本変形例のＳＬＡＭ処理部１２０は、マスク情報に相当する領域においては、少なくとも、撮像画像を自己位置の推定または地図情報の生成のいずれかには使用しない。例えば、ＳＬＡＭ処理部１２０は、マスク情報に相当する領域において、撮像画像を自己位置の推定処理と地図情報の生成処理の両方に使用しないものとしても良いし、いずれか一方の処理にのみ使用しないものとしても良い。例えば、ＳＬＡＭ処理部１２０は、マスク情報に相当する領域において、撮像画像を移動のための自己位置の推定処理に使用したとしても、地図情報の生成には使用しないものとしても良い。 The SLAM processing unit 120 of this modification does not use the captured image at least for either self-position estimation or map information generation in the region corresponding to the mask information. For example, the SLAM processing unit 120 may not use the captured image for both the self-position estimation process and the map information generation process in the region corresponding to the mask information, or may not use it for only one of the processes. It may be a thing. For example, the SLAM processing unit 120 may use the captured image for self-position estimation processing for movement in a region corresponding to mask information, or may not use it for generating map information.

　具体的には、照明または日照条件によって撮像画像に白飛び領域、または黒つぶれ領域が発生する場合がある。このような領域を使用すると、自己位置推定または地図情報の精度が低下する場合がある。本変形例では、このような事象が発生する可能性のある領域においては撮像画像を自己位置の推定または地図情報の生成に使用しないことにより、自己位置推定または地図情報の精度の低下を低減する。 Specifically, overexposed areas or underexposed areas may occur in the captured image depending on the lighting or sunshine conditions. The use of such areas may reduce the accuracy of self-position estimation or map information. In this modified example, the deterioration of the accuracy of the self-position estimation or the map information is reduced by not using the captured image for the self-position estimation or the generation of the map information in the region where such an event may occur. ..

　なお、ＳＬＡＭ処理部１２０は、マスク情報に相当する領域において、撮像画像を自己位置の推定または地図情報の生成に全く使用しないのではなく、優先度を下げて使用するものとしてもよい。例えば、情報処理装置１が、撮像装置１７以外に周囲の状態を検知するセンサ等を備える場合には、ＳＬＡＭ処理部１２０は、マスク情報に相当する領域については、当該センサ等による検知結果を、撮像画像よりも優先して自己位置の推定または地図情報の生成に使用する。 Note that the SLAM processing unit 120 may not use the captured image at all for estimating its own position or generating map information in the region corresponding to the mask information, but may use it with a lower priority. For example, when the information processing device 1 includes a sensor or the like for detecting the surrounding state in addition to the image pickup device 17, the SLAM processing unit 120 displays the detection result by the sensor or the like for the area corresponding to the mask information. It is used to estimate the self-position or generate map information in preference to the captured image.

（変形例４）
　また、変換部１０２は、環境情報に基づいて、撮像画像の階調を変更してもよい。例えば、変換部１０２は、周囲の照明または天気に関する情報に基づいて、撮像画像のダイナミックレンジを変更することにより、白とびまたは黒つぶれを低減する。 (Modification example 4)
Further, the conversion unit 102 may change the gradation of the captured image based on the environmental information. For example, the conversion unit 102 reduces overexposure or underexposure by changing the dynamic range of the captured image based on information about ambient lighting or weather.

　この場合、ＳＬＡＭ処理部１２０は、変換部１０２によって諧調が変更された撮像画像に基づいて、自己位置の推定と地図情報の生成とを実行する。 In this case, the SLAM processing unit 120 executes self-position estimation and map information generation based on the captured image whose gradation has been changed by the conversion unit 102.

　このため、本変形例の情報処理装置１によれば、照明条件または日照条件等の周囲の環境に対応して、ロバストに自己位置の推定と地図情報の生成とを実行することができる。 Therefore, according to the information processing device 1 of this modification, it is possible to robustly estimate the self-position and generate map information in response to the surrounding environment such as lighting conditions or sunshine conditions.

（変形例５）
　また、環境情報は、３次元設計情報と、建物９の建設工程を表す工程情報とを含むものとしてもよい。 (Modification 5)
Further, the environmental information may include three-dimensional design information and process information representing the construction process of the building 9.

　本変形例における工程情報は、建物９の建築のスケジュール、またはタイムライン（予定表）を表す情報である。建物９が点築途中である場合、ＢＩＭ情報等の３次元設計情報と、工程情報とを照合すると、建物９の建築が完了している領域と、建築途中の領域とが区別可能になる。３次元設計情報は、基本的には、建築が完了した状態における建物９の３次元モデルを表すため、建築途中の領域では、３次元設計情報と現実の建物９の状態とに差異がある可能性が高い。 The process information in this modification is information representing the construction schedule or timeline (schedule) of the building 9. When the building 9 is in the process of being built, by collating the three-dimensional design information such as BIM information with the process information, it becomes possible to distinguish between the area where the building 9 has been constructed and the area where the building 9 is in the process of being constructed. Since the 3D design information basically represents a 3D model of the building 9 in the completed state of construction, there may be a difference between the 3D design information and the actual state of the building 9 in the area in the middle of construction. Highly sexual.

　本変形例の変換部１０２は、３次元設計情報と、工程情報とに基づいて、建物９のうち、建築が完了していない領域を表す未完成領域情報を生成する。 The conversion unit 102 of this modification generates unfinished area information representing an area of the building 9 in which the construction has not been completed, based on the three-dimensional design information and the process information.

　本変形例のＳＬＡＭ処理部１２０は、未完成領域情報に相当する領域については、３次元設計情報を使用せずに、自己位置の推定と地図情報の生成とを実行する。例えば、ＳＬＡＭ処理部１２０は、未完成領域情報に相当する領域については、撮像画像またはセンサ等の検知結果に基づいて、自己位置の推定と地図情報の生成とを実行する。 The SLAM processing unit 120 of this modified example executes self-position estimation and map information generation for the area corresponding to the unfinished area information without using the three-dimensional design information. For example, the SLAM processing unit 120 executes self-position estimation and map information generation based on the detection result of the captured image or the sensor for the region corresponding to the unfinished region information.

　このため、本変形例の情報処理装置１は、３次元設計情報と現実の建物９の状態とに差異がある可能性が高い領域については、３次元設計情報を使用しないことにより、建物９が建築途中である場合にも、自己位置の推定と地図情報の精度が低下することを低減する。 Therefore, the information processing device 1 of this modified example does not use the three-dimensional design information in the region where there is a high possibility that there is a difference between the three-dimensional design information and the actual state of the building 9, so that the building 9 can be used. Even in the middle of construction, it is possible to reduce the decrease in the accuracy of self-position estimation and map information.

　なお、ＳＬＡＭ処理部１２０は、未完成領域情報に相当する領域において、３次元設計情報を全く使用しないのではなく、優先度を下げて使用するものとしてもよい。 Note that the SLAM processing unit 120 may use the three-dimensional design information at a lower priority in the area corresponding to the unfinished area information, instead of not using the three-dimensional design information at all.

（変形例６）
　また、変形例５で説明した未完成領域情報の用途は、上述の例に限定されるものではない。 (Modification 6)
Further, the use of the unfinished area information described in the modified example 5 is not limited to the above-mentioned example.

　例えば、ＳＬＡＭ処理部１２０は、建物９の地図情報を生成する場合に、未完成領域情報に対応する領域のみ地図情報を生成するものとしてもよい。つまり、ＳＬＡＭ処理部１２０は、建築が完了した領域については、建物９の構造が変化しないものと推定し、建物９の構造が変化する領域、つまり未完成領域情報に対応する領域についてのみ地図情報を生成することで演算量を低減する。 For example, when the SLAM processing unit 120 generates the map information of the building 9, the map information may be generated only in the area corresponding to the unfinished area information. That is, the SLAM processing unit 120 estimates that the structure of the building 9 does not change in the area where the construction is completed, and the map information is only in the area where the structure of the building 9 changes, that is, the area corresponding to the unfinished area information. The amount of calculation is reduced by generating.

　また、ＳＬＡＭ処理部１２０のトラッキング部１０３は、トラッキング処理において撮像装置１７の位置および姿勢を求める場合には、未完成領域情報に対応する領域以外の領域を撮像した撮像画像を使用してトラッキング処理を行うものとしてもよい。これは、未完成領域情報に対応する領域については、建築作業によって被写体である構造物が変化するため、異なる時点で撮像された撮像画像間で、点５０の追跡をすることが困難な場合があるからである。 Further, when the tracking unit 103 of the SLAM processing unit 120 obtains the position and orientation of the image pickup device 17 in the tracking process, the tracking unit 103 uses the captured image obtained by capturing the area other than the area corresponding to the unfinished area information in the tracking process. May be done. This is because, for the area corresponding to the unfinished area information, the structure that is the subject changes depending on the construction work, so it may be difficult to track the point 50 between the captured images captured at different times. Because there is.

（変形例７）
　また、上述の第１から第３の実施形態では、情報処理装置１は、建物９内を移動中に、現在の自己位置の推定処理および地図情報の生成処理をリアルタイムに実行する例を説明したが、自己位置の推定処理および地図情報の生成処理の実行タイミングはこれに限定されるものではない。例えば、情報処理装置１は、移動の終了後に、移動中に検知した情報処理装置１の周囲の状態または情報処理装置１の状態の検知結果に基づいて、自己位置の推定処理または地図情報の生成処理を実行しても良い。 (Modification 7)
Further, in the first to third embodiments described above, an example has been described in which the information processing device 1 executes the current self-position estimation process and the map information generation process in real time while moving in the building 9. However, the execution timing of the self-position estimation process and the map information generation process is not limited to this. For example, after the movement is completed, the information processing device 1 estimates its own position or generates map information based on the detection result of the surrounding state of the information processing device 1 or the state of the information processing device 1 detected during the movement. You may execute the process.

（変形例８）
　また、上述の第１から第３の実施形態では、情報処理装置１が、自己位置の推定処理および地図情報の生成処理を実行するものして説明したが、外部装置２が、自己位置の推定処理および地図情報の生成処理を実行する構成を採用しても良い。例えば、外部装置２は、情報処理装置１から取得した検知結果と、環境情報とに基づいて、情報処理装置１の位置の推定処理と、地図情報の生成処理と実行しても良い。この場合、外部装置２を情報処理装置の一例としても良い。 (Modification 8)
Further, in the first to third embodiments described above, the information processing device 1 executes the self-position estimation process and the map information generation process, but the external device 2 estimates the self-position. A configuration that executes processing and generation processing of map information may be adopted. For example, the external device 2 may execute the position estimation process of the information processing device 1 and the map information generation process based on the detection result acquired from the information processing device 1 and the environmental information. In this case, the external device 2 may be used as an example of the information processing device.

　なお、本明細書（請求項を含む）において、「a、b及びcの少なくとも1つ（一方）」又は「a、b又はcの少なくとも1つ（一方）」の表現（同様な表現を含む）が用いられる場合は、a、b、c、a-b、a-c、b-c、又はa-b-cのいずれかを含む。また、a-a、a-b-b、a-a-b-b-c-c等のように、いずれかの要素について複数のインスタンスを含んでもよい。さらに、a-b-c-dのようにdを有する等、列挙された要素（a、b及びc）以外の他の要素を加えることも含む。 In the present specification (including claims), the expression "at least one of a, b and c (one)" or "at least one of a, b or c (one)" (including similar expressions). ) Is used, it includes any of a, b, c, ab, ac, bc, or abc. It may also include multiple instances of any element, such as a-a, a-b-b, a-a-b-b-c-c, and the like. It also includes adding elements other than the listed elements (a, b and c), such as having d, such as a-b-c-d.

　本明細書（請求項を含む）において、「データを入力として／データに基づいて／に従って／に応じて」等の表現（同様な表現を含む）が用いられる場合は、特に断りがない場合、各種データそのものを入力として用いる場合や、各種データに何らかの処理を行ったもの（例えば、ノイズ加算したもの、正規化したもの、各種データの中間表現等）を入力として用いる場合を含む。また「データに基づいて／に従って／に応じて」何らかの結果が得られる旨が記載されている場合、当該データのみに基づいて当該結果が得られる場合を含むとともに、当該データ以外の他のデータ、要因、条件、及び／又は状態等にも影響を受けて当該結果が得られる場合をも含み得る。また、「データを出力する」旨が記載されている場合、特に断りがない場合、各種データそのものを出力として用いる場合や、各種データに何らかの処理を行ったもの（例えば、ノイズ加算したもの、正規化したもの、各種データの中間表現等）を出力とする場合も含む。 In the present specification (including claims), when expressions such as "with data as input / based on / according to / according to" (including similar expressions) are used, unless otherwise specified. This includes the case where various data itself is used as an input, and the case where various data are processed in some way (for example, noise-added data, normalized data, intermediate representation of various data, etc.) are used as input. In addition, when it is stated that some result can be obtained "based on / according to / according to the data", it includes the case where the result can be obtained based only on the data, and other data other than the data. It may also include cases where the result is obtained under the influence of factors, conditions, and / or conditions. In addition, when it is stated that "data is output", unless otherwise specified, various data itself is used as output, or various data is processed in some way (for example, noise is added, normal). It also includes the case where the output is output (intermediate representation of various data, etc.).

　本明細書（請求項を含む）において、「接続される（connected）」及び「結合される（coupled）」との用語が用いられる場合は、直接的な接続／結合、間接的な接続／結合、電気的（electrically）な接続／結合、通信的（communicatively）な接続／結合、機能的（operatively）な接続／結合、物理的（physically）な接続／結合等のいずれをも含む非限定的な用語として意図される。当該用語は、当該用語が用いられた文脈に応じて適宜解釈されるべきであるが、意図的に或いは当然に排除されるのではない接続／結合形態は、当該用語に含まれるものして非限定的に解釈されるべきである。 In the present specification (including claims), when the terms "connected" and "coupled" are used, direct connection / coupling and indirect connection / coupling are used. , Electrically connected / combined, communicatively connected / combined, operatively connected / combined, physically connected / combined, etc. Intended as a term. The term should be interpreted as appropriate according to the context in which the term is used, but any connection / combination form that is not intentionally or naturally excluded is not included in the term. It should be interpreted in a limited way.

　本明細書（請求項を含む）において、「ＡがＢするよう構成される（A　configured　to　B）」との表現が用いられる場合は、要素Ａの物理的構造が、動作Ｂを実行可能な構成を有するとともに、要素Ａの恒常的（permanent）又は一時的（temporary）な設定（setting/configuration）が、動作Ｂを実際に実行するように設定（configured/set）されていることを含んでよい。例えば、要素Ａが汎用プロセッサである場合、当該プロセッサが動作Ｂを実行可能なハードウェア構成を有するとともに、恒常的（permanent）又は一時的（temporary）なプログラム（命令）の設定により、動作Ｂを実際に実行するように設定（configured）されていればよい。また、要素Ａが専用プロセッサ又は専用演算回路等である場合、制御用命令及びデータが実際に付属しているか否かとは無関係に、当該プロセッサの回路的構造が動作Ｂを実際に実行するように構築（implemented）されていればよい。 When the expression "A configured to B" is used in the present specification (including claims), the physical structure of the element A can execute the operation B. Including that the element A has a configuration and the permanent or temporary setting (setting / configuration) of the element A is set (configured / set) to actually execute the operation B. good. For example, when the element A is a general-purpose processor, the processor has a hardware configuration capable of executing the operation B, and the operation B is set by setting a permanent or temporary program (instruction). It suffices if it is configured to actually execute. Further, when the element A is a dedicated processor, a dedicated arithmetic circuit, or the like, the circuit structure of the processor actually executes the operation B regardless of whether or not the control instruction and data are actually attached. It only needs to be implemented.

　本明細書（請求項を含む）において、含有又は所有を意味する用語（例えば、「含む（comprising/including）」及び有する「（having）等）」が用いられる場合は、当該用語の目的語により示される対象物以外の物を含有又は所有する場合を含む、open-endedな用語として意図される。これらの含有又は所有を意味する用語の目的語が数量を指定しない又は単数を示唆する表現（a又はanを冠詞とする表現）である場合は、当該表現は特定の数に限定されないものとして解釈されるべきである。 In the present specification (including claims), when a term meaning inclusion or possession (for example, "comprising / including" and "having", etc.) is used, the object of the term is used. It is intended as an open-ended term, including the case of containing or owning an object other than the indicated object. If the object of these terms that mean inclusion or possession is an expression that does not specify a quantity or suggests a singular (an expression with a or an as an article), the expression is interpreted as not being limited to a specific number. It should be.

　本明細書（請求項を含む）において、ある箇所において「１つ又は複数（one　or　more）」又は「少なくとも１つ（at　least　one）」等の表現が用いられ、他の箇所において数量を指定しない又は単数を示唆する表現（a又はanを冠詞とする表現）が用いられているとしても、後者の表現が「１つ」を意味することを意図しない。一般に、数量を指定しない又は単数を示唆する表現（a又はanを冠詞とする表現）は、必ずしも特定の数に限定されないものとして解釈されるべきである。 In this specification (including claims), expressions such as "one or more" or "at least one" are used in some places, and the quantity is specified in other places. Even if expressions that do not or suggest the singular (expressions with a or an as an article) are used, the latter expression is not intended to mean "one". In general, expressions that do not specify a quantity or suggest a singular (expressions with a or an as an article) should be interpreted as not necessarily limited to a particular number.

　本明細書において、ある実施例の有する特定の構成について特定の効果（advantage/result）が得られる旨が記載されている場合、別段の理由がない限り、当該構成を有する他の１つ又は複数の実施例についても当該効果が得られると理解されるべきである。但し当該効果の有無は、一般に種々の要因、条件、及び／又は状態等に依存し、当該構成により必ず当該効果が得られるものではないと理解されるべきである。当該効果は、種々の要因、条件、及び／又は状態等が満たされたときに実施例に記載の当該構成により得られるものに過ぎず、当該構成又は類似の構成を規定したクレームに係る発明において、当該効果が必ずしも得られるものではない。 In the present specification, when it is stated that a specific effect (advantage / result) can be obtained for a specific configuration having an embodiment, unless there is a specific reason, another one or more having the configuration. It should be understood that the effect can also be obtained in the examples of. However, it should be understood that the presence or absence of the effect generally depends on various factors, conditions, and / or states, etc., and that the effect cannot always be obtained by the configuration. The effect is merely obtained by the configuration described in the examples when various factors, conditions, and / or conditions are satisfied, and in the invention relating to the claim that defines the configuration or a similar configuration. , The effect is not always obtained.

　本明細書（請求項を含む）において、「最大化（maximize）」等の用語が用いられる場合は、グローバルな最大値を求めること、グローバルな最大値の近似値を求めること、ローカルな最大値を求めること、及びローカルな最大値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最大値の近似値を確率的又はヒューリスティックに求めることを含む。同様に、「最小化（minimize）」等の用語が用いられる場合は、グローバルな最小値を求めること、グローバルな最小値の近似値を求めること、ローカルな最小値を求めること、及びローカルな最小値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最小値の近似値を確率的又はヒューリスティックに求めることを含む。同様に、「最適化（optimize）」等の用語が用いられる場合は、グローバルな最適値を求めること、グローバルな最適値の近似値を求めること、ローカルな最適値を求めること、及びローカルな最適値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最適値の近似値を確率的又はヒューリスティックに求めることを含む。 In the present specification (including claims), when terms such as "maximize" are used, the global maximum value is obtained, the approximate value of the global maximum value is obtained, and the local maximum value is obtained. Should be interpreted as appropriate according to the context in which the term is used, including finding an approximation of the local maximum. It also includes probabilistically or heuristically finding approximate values of these maximum values. Similarly, when terms such as "minimize" are used, find the global minimum, find the approximation of the global minimum, find the local minimum, and find the local minimum. It should be interpreted as appropriate according to the context in which the term was used, including finding an approximation of the value. It also includes probabilistically or heuristically finding approximate values of these minimum values. Similarly, when terms such as "optimize" are used, finding a global optimal value, finding an approximation of a global optimal value, finding a local optimal value, and local optimization It should be interpreted as appropriate according to the context in which the term was used, including finding an approximation of the value. It also includes probabilistically or heuristically finding approximate values of these optimal values.

　本明細書（請求項を含む）において、複数のハードウェアが所定の処理を行う場合、各ハードウェアが協働して所定の処理を行ってもよいし、一部のハードウェアが所定の処理の全てを行ってもよい。また、一部のハードウェアが所定の処理の一部を行い、別のハードウェアが所定の処理の残りを行ってもよい。本明細書（請求項を含む）において、「１又は複数のハードウェアが第１の処理を行い、前記１又は複数のハードウェアが第２の処理を行う」等の表現が用いられている場合、第１の処理を行うハードウェアと第２の処理を行うハードウェアは同じものであってもよいし、異なるものであってもよい。つまり、第１の処理を行うハードウェア及び第２の処理を行うハードウェアが、前記１又は複数のハードウェアに含まれていればよい。なお、ハードウェアは、電子回路、又は電子回路を含む装置等を含んでよい。 In the present specification (including claims), when a plurality of hardware performs a predetermined process, the respective hardware may cooperate to perform the predetermined process, or some hardware may perform the predetermined process. You may do all of the above. Further, some hardware may perform a part of a predetermined process, and another hardware may perform the rest of the predetermined process. In the present specification (including claims), when expressions such as "one or more hardware performs the first process and the one or more hardware performs the second process" are used. , The hardware that performs the first process and the hardware that performs the second process may be the same or different. That is, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more hardware. The hardware may include an electronic circuit, a device including the electronic circuit, or the like.

　本明細書（請求項を含む）において、複数の記憶装置（メモリ）がデータの記憶を行う場合、複数の記憶装置（メモリ）のうち個々の記憶装置（メモリ）は、データの一部のみを記憶してもよいし、データの全体を記憶してもよい。 In the present specification (including claims), when a plurality of storage devices (memory) store data, each storage device (memory) among the plurality of storage devices (memory) stores only a part of the data. It may be stored or the entire data may be stored.

　以上説明したとおり、第１から第３の実施形態によれば、自己位置の推定および地図情報の精度を向上させることができる。 As described above, according to the first to third embodiments, it is possible to improve the estimation of the self-position and the accuracy of the map information.

　以上、本開示の実施形態について詳述したが、本開示は上記した個々の実施形態に限定されるものではない。請求の範囲に規定された内容及びその均等物から導き出される本発明の概念的な思想と趣旨を逸脱しない範囲において種々の追加、変更、置き換え及び部分的削除等が可能である。例えば、前述した全ての実施形態において、数値又は数式を説明に用いている場合は、一例として示したものであり、これらに限られるものではない。また、実施形態における各動作の順序は、一例として示したものであり、これらに限られるものではない。 Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to the individual embodiments described above. Various additions, changes, replacements, partial deletions, etc. are possible without departing from the conceptual idea and purpose of the present invention derived from the contents specified in the claims and their equivalents. For example, in all the above-described embodiments, when numerical values or mathematical formulas are used for explanation, they are shown as examples, and the present invention is not limited thereto. Further, the order of each operation in the embodiment is shown as an example, and is not limited to these.

Claims

With at least one memory
With at least one processor,
The at least one processor
Acquiring the detection result including any one of the surrounding state of the information processing device and the state of the information processing device, and the environmental information regarding the environment around the information processing device.
Performing self-position estimation and map information generation based on the environmental information and the detection result, and
Is configured to be executable,
Information processing device.

The environmental information includes three-dimensional design information of the building.
The information processing device according to claim 1.

The at least one processor
Based on the 3D design information, information about the position of surrounding objects is generated.
Based on the information about the position of the surrounding object, the estimation of the self-position and the generation of the map information are executed.
The information processing device according to claim 2.

The detection result includes a plurality of captured images captured by an imaging device mounted on the information processing device.
The at least one processor
Based on the plurality of captured images, the position and orientation of the imaging device are estimated, and the position and orientation of the imaging device are estimated.
Based on the three-dimensional design information and the position and orientation of the imaging device, information regarding the position of the surrounding object is generated.
The information processing device according to claim 3.

The at least one processor
Based on the 3D design information, the position of the plane or curved surface of the surrounding object is estimated.
Generates information about the position of the surrounding object based on the constraint that a plurality of points existing in the surroundings are located on the plane or the curved surface.
The information processing device according to claim 4.

The at least one processor
From the detection result, the index information whose position is represented by the coordinate system in the three-dimensional design information is detected.
Based on the detection result of the index information, the coordinate system representing the self-position is adjusted so as to match the coordinate system of the three-dimensional design information.
The information processing device according to any one of claims 2 to 5.

The at least one processor
Applying a set of a plurality of figures having three-dimensional coordinates to the three-dimensional design information,
Based on the estimation result of the self-position, the positions and orientations of the plurality of figures are corrected.
A set of the plurality of figures whose positions and orientations have been corrected is generated as the map information.
The information processing device according to any one of claims 2 to 6.

The environmental information includes the three-dimensional design information and process information representing the construction process of the building.
The at least one processor
Based on the three-dimensional design information and the process information, unfinished area information representing an area where construction has not been completed in the building is generated.
For the region corresponding to the unfinished region information, the estimation of the self-position and the generation of the map information are executed without using the three-dimensional design information.
The information processing device according to any one of claims 2 to 7.

The at least one processor
Based on the environmental information, mask information representing an area to be excluded from the generation target of the map information is generated.
The map information is not generated for the area corresponding to the mask information.
The information processing device according to any one of claims 1 to 8.

The environmental information includes either one of the entry / exit information of a person in the building where the information processing device is located or the image recognition result of the person in the captured image captured by the image pickup device mounted on the information processing device. ,
The at least one processor
The mask information is generated based on either the entry / exit information or the image recognition result of the person.
The information processing device according to claim 9.

The at least one processor
The captured image captured by the image pickup device mounted on the information processing device is divided into regions based on the recognition result of the object drawn on the captured image.
Based on the result of the region division, the estimation of the self-position and the generation of the map information are executed.
The information processing device according to any one of claims 1 to 10.

The environmental information includes information about any one of the ambient lighting or the weather.
The detection result includes an image captured by an image pickup device mounted on the information processing device.
The at least one processor
Based on the environmental information, a second mask information representing a region where the captured image is likely to be deteriorated is generated.
For the region corresponding to the second mask information, the captured image is not used at least in either the estimation of the self-position or the generation of the map information.
The information processing device according to any one of claims 1 to 10.

The environmental information includes information about any one of the ambient lighting or the weather.
The detection result includes an image captured by an image pickup device mounted on the information processing device.
The at least one processor
Based on the environmental information, the gradation of the captured image is changed.
Based on the captured image whose tone has been changed, the estimation of the self-position and the generation of the map information are executed.
The information processing device according to any one of claims 1 to 12.

The environmental information includes an image captured by an image pickup device mounted on the information processing device.
The at least one processor
Based on the captured image, information about the position of surrounding objects is generated.
Based on the information about the position of the surrounding object, the estimation of the self-position and the generation of the map information are executed.
The information processing device according to any one of claims 1 to 13.

The at least one processor
The position of the surrounding object is calculated as the spatial coordinates of multiple points in the three-dimensional space.
The calculated spatial coordinates of the plurality of points are output as the map information.
The information processing device according to any one of claims 1 to 14.

A step of acquiring a detection result including any one of the surrounding state of the information processing device and the state of the information processing device and environmental information about the environment around the information processing device by at least one processor.
A step of performing self-position estimation and map information generation based on the environmental information and the detection result by the at least one processor.
Information processing methods including.

A step of acquiring a detection result including at least one of the surrounding state of the information processing device and the state of the information processing device, and environmental information regarding the environment around the information processing device.
A step of estimating the self-position and generating map information based on the environmental information and the detection result, and
A program that causes at least one computer to run.