[go: up one dir, main page]

WO2021210492A1 - Information processing device, information processing method, and program - Google Patents

Information processing device, information processing method, and program Download PDF

Info

Publication number
WO2021210492A1
WO2021210492A1 PCT/JP2021/014938 JP2021014938W WO2021210492A1 WO 2021210492 A1 WO2021210492 A1 WO 2021210492A1 JP 2021014938 W JP2021014938 W JP 2021014938W WO 2021210492 A1 WO2021210492 A1 WO 2021210492A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
processing device
information processing
unit
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/014938
Other languages
French (fr)
Japanese (ja)
Inventor
貴之 猿田
亮 水谷
達雄 古賀
仁紀 木内
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kajima Corp
Preferred Networks Inc
Original Assignee
Kajima Corp
Preferred Networks Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kajima Corp, Preferred Networks Inc filed Critical Kajima Corp
Publication of WO2021210492A1 publication Critical patent/WO2021210492A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/10Simultaneous control of position or course in three dimensions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B29/00Maps; Plans; Charts; Diagrams, e.g. route diagram

Definitions

  • An embodiment of the present invention relates to an information processing device, an information processing method, and a program.
  • robots and the like that estimate their own position and generate map information by recognizing the position and shape of surrounding objects from the sensing result by the sensor or the captured image are known.
  • the information processing device of the embodiment includes at least one memory and at least one processor. At least one processor acquires a detection result including either the surrounding state of the information processing device or the state of the information processing device, and environmental information about the environment around the information processing device, and the environmental information. Based on the information processing and the detection result, it is possible to estimate the self-position and generate map information.
  • FIG. 1 is a block diagram showing an example of the hardware configuration of the information processing apparatus according to the first embodiment.
  • FIG. 2 is a block diagram showing an example of a function provided in the information processing apparatus according to the first embodiment.
  • FIG. 3 is an image diagram showing an example of tracking processing according to the first embodiment.
  • FIG. 4 is an image diagram showing an example of the positional relationship between the information processing apparatus according to the first embodiment and surrounding objects.
  • FIG. 5 is an image diagram showing an example of bundle adjustment according to the first embodiment.
  • FIG. 6 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the first embodiment.
  • FIG. 7 is a block diagram showing an example of the functions provided in the information processing apparatus according to the second embodiment.
  • FIG. 1 is a block diagram showing an example of the hardware configuration of the information processing apparatus according to the first embodiment.
  • FIG. 2 is a block diagram showing an example of a function provided in the information processing apparatus according to the first embodiment.
  • FIG. 3 is an image
  • FIG. 8 is an image diagram showing an example of the positional relationship between the information processing apparatus according to the second embodiment and surrounding objects.
  • FIG. 9 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the second embodiment.
  • FIG. 10 is a block diagram showing an example of the functions provided in the information processing apparatus according to the third embodiment.
  • FIG. 11 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the third embodiment.
  • FIG. 12 is a diagram showing an example of segmentation of a captured image according to a fourth embodiment.
  • FIG. 13 is a diagram showing an example of map information according to the second modification.
  • FIG. 1 is a block diagram showing an example of the hardware configuration of the information processing apparatus 1 according to the first embodiment.
  • the information processing device 1 includes a main body 10, a moving device 16, an imaging device 17, and an IMU (Inertial Measurement Unit) sensor 18.
  • IMU Inertial Measurement Unit
  • the moving device 16 is a device capable of moving the information processing device 1.
  • the moving device 16 has a plurality of wheels and a motor for driving these wheels, and is connected to the lower part of the main body 10 so as to support the main body 10.
  • the information processing device 1 can be moved by the mobile device 16, for example, in a building under construction, a built building, a platform of a station, a factory, or the like.
  • the case where the information processing device 1 moves in the building under construction will be described as an example.
  • the means of transportation of the information processing device 1 is not limited to wheels, and may be caterpillars, propellers, or the like.
  • the information processing device 1 is, for example, a robot, a drone, or the like. In the present embodiment, the information processing device 1 is supposed to move autonomously, but the information processing device 1 is not limited to this.
  • the image pickup device 17 is, for example, a stereo camera in which two cameras arranged side by side are set as one set.
  • the image pickup device 17 transmits the captured image data captured by the two cameras to the main body 10 in association with each other.
  • the IMU sensor 18 is a sensor in which a gyro sensor, an acceleration sensor, and the like are integrated, and measures the angular velocity and acceleration of the information processing device 1.
  • the IMU sensor 18 sends the measured angular velocity and acceleration to the main body 10.
  • the IMU sensor 18 may further include not only a gyro sensor and an acceleration sensor, but also a magnetic sensor, a GPS (Global Positioning System) device, and the like.
  • the image pickup device 17 and the IMU sensor 18 are collectively referred to as a detection unit.
  • the detection unit may further include various sensors.
  • the information processing device 1 may further include a distance measuring sensor such as an ultrasonic sensor or a laser scanner.
  • the term "detection” refers to imaging the surroundings of the information processing device 1, measuring the angular velocity or acceleration of the information processing device 1, and the distance to an object around the information processing device 1. It shall include measuring the distance.
  • the detection result by the detection unit includes at least one of the surrounding state of the information processing device 1 and the state of the information processing device 1.
  • the detection result may include both information about the surrounding state of the information processing device 1 and the state of the information processing device 1, or may relate to the surrounding state of the information processing device 1 and the state of the information processing device 1. It may contain only one of the information.
  • the surrounding state of the information processing device 1 is, for example, an captured image of the surroundings of the information processing device 1, a distance measurement result of a distance between an object around the information processing device 1 and the information processing device 1.
  • the state of the information processing device 1 is, for example, the angular velocity and acceleration measured by the IMU sensor 18.
  • the captured image captured by the imaging device 17 is an example of the detection result of the surrounding state of the information processing device 1.
  • the detection result includes at least the captured image, but may further include other information.
  • the main body 10 includes a processor 11, a main storage device 12 (memory), an auxiliary storage device 14 (memory), a network interface 13, and a device interface 15, which are routed via a bus 19. It may be realized as a connected computer.
  • the image pickup device 17 and the IMU sensor 18 may be incorporated in the main body 10.
  • the processor 11 is an electronic circuit (processing circuit, Processing circuit, Processing circuitry, CPU (Central Processing Unit), GPU (Graphics Processing Unit), FPGA (Field Programmable Gate Array), or ASIC (Processing circuit, Processing circuit, Processing circuitry, CPU (Central Processing Unit)) including a computer control device and a computing device. Application Specific Integrated Circuit), etc.) may be used. Further, the processor 11 may be a semiconductor device or the like including a dedicated processing circuit. The processor 11 is not limited to an electronic circuit using an electronic logic element, and may be realized by an optical circuit using an optical logic element. Further, the processor 11 may include a calculation function based on quantum computing.
  • the processor 11 can perform arithmetic processing based on the data and software (program) input from each apparatus and the like of the internal configuration of the information processing apparatus 1, and output the arithmetic result and the control signal to each apparatus and the like.
  • the processor 11 may control each component constituting the information processing device 1 by executing an OS (Operating System) of the information processing device 1, an application, or the like.
  • OS Operating System
  • the main storage device 12 is a storage device that stores instructions executed by the processor 11, various data, and the like, and the information stored in the main storage device 12 is read out by the processor 11.
  • the auxiliary storage device 14 is a storage device other than the main storage device 12. Note that these storage devices mean arbitrary electronic components capable of storing electronic information, and may be semiconductor memories.
  • the semiconductor memory may be either a volatile memory or a non-volatile memory.
  • the storage device for storing various data in the information processing device 1 in the present embodiment may be realized by the main storage device 12 or the auxiliary storage device 14, or may be realized by the built-in memory built in the processor 11. ..
  • the main storage device 12 or the auxiliary storage device 14 is also referred to as a storage unit.
  • processors may be connected (combined) to one storage device (memory), or a single processor may be connected.
  • a plurality of storage devices (memory) may be connected (combined) to one processor.
  • the information processing device 1 in the present embodiment is composed of at least one storage device (memory) and a plurality of processors connected (combined) to the at least one storage device (memory), at least one of the plurality of processors
  • One processor may include a configuration in which it is connected (combined) to at least one storage device (memory). Further, this configuration may be realized by a storage device (memory) and a processor included in a plurality of computers. Further, a configuration in which the storage device (memory) is integrated with the processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.
  • the network interface 13 is an interface for connecting to the communication network 3 wirelessly or by wire.
  • an appropriate interface such as one conforming to an existing communication standard may be used.
  • Information may be exchanged with the external device 2 connected via the communication network 3 by the network interface 13.
  • the communication network 3 may be any one of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), or a combination thereof, and the information processing device 1 and the external device 2 may be used. It suffices as long as information is exchanged with.
  • WAN Wide Area Network
  • LAN Local Area Network
  • PAN Personal Area Network
  • An example of WAN is the Internet
  • an example of LAN is IEEE802.11, Ethernet (registered trademark), etc.
  • PAN is Bluetooth (registered trademark), NFC (Near Field Communication), etc.
  • the device interface 15 is an interface that directly connects to the mobile device 16, the image pickup device 17, and the IMU sensor 18.
  • the device interface 15 is an interface that conforms to a standard such as USB (Universal Serial Bus), but is not limited thereto. Further, the device interface 15 may be further connected to an external device other than the various devices shown in FIG.
  • the external device 2 is, for example, a server device or the like.
  • the external device 2 is connected to the information processing device 1 via a communication network 3.
  • the external device 2 of this embodiment stores the three-dimensional design information of the building in advance.
  • the three-dimensional design information of the building is, for example, BIM (Building Information Modeling) information.
  • BIM information includes information on the three-dimensional structure of a building and information on materials such as building materials.
  • the three-dimensional design information of the building is not limited to BIM information, and may be 3D CAD (Computer-Aided Design) data or the like.
  • the three-dimensional design information is an example of environmental information in this embodiment.
  • the environmental information is information about the environment around the information processing device 1.
  • the environmental information includes at least one of information about a building in which the information processing device 1 travels, information about a person or an object existing around the information processing device 1, information about the weather, and information about lighting.
  • the above-mentioned three-dimensional design information is an example of information on the building in which the information processing device 1 travels among the environmental information.
  • the environmental information may be a combination of a plurality of types of information, or may include only one type of information.
  • the environmental information includes at least three-dimensional design information, but may further include other information regarding the environment around the information processing device 1.
  • object in this embodiment includes structures such as walls and pillars, furniture, furniture, moving objects, temporary objects, people, and the like.
  • the information processing device 1 and the external device 2 are wirelessly connected, but the information processing device 1 and the external device 2 may be connected by wire. Further, the information processing device 1 does not have to be always connected to the external device 2.
  • FIG. 2 is a block diagram showing an example of the functions included in the information processing apparatus 1 according to the first embodiment.
  • the information processing device 1 includes an acquisition unit 101, a conversion unit 102, a SLAM (Simultaneous Localization and Mapping, or Simultaneously Localization and Mapping) processing unit 120, and a movement control unit 105. Further, the SLAM processing unit 120 includes a tracking unit 103 and a bundle adjustment unit 104.
  • SLAM Simultaneous Localization and Mapping, or Simultaneously Localization and Mapping
  • the acquisition unit 101 acquires the detection result of the surrounding state of the information processing device 1 or the state of the information processing device 1 and the environmental information regarding the environment around the information processing device 1.
  • the acquisition unit 101 acquires BIM information from the external device 2 via, for example, the network interface 13.
  • the acquisition unit 101 stores the acquired BIM information in the auxiliary storage device 14.
  • the acquisition unit 101 acquires an captured image from the imaging device 17 via the device interface 15. In addition, the acquisition unit 101 acquires the angular velocity and acceleration from the IMU sensor 18 via the device interface 15.
  • the conversion unit 102 converts the environmental information into an input value of at least one of the self-position estimation process by the SLAM processing unit 120 described later or the map information generation process.
  • the conversion unit 102 may convert the environmental information into the input values of both the self-position estimation process and the map information generation process, or may convert only the input values of either process.
  • conversion includes generating other information from the environmental information or extracting, acquiring or searching the information from the environmental information.
  • the conversion unit 102 generates the initial value of the three-dimensional coordinates (world coordinates) of the points in the three-dimensional space used in the bundle adjustment process by the bundle adjustment unit 104 described later from the BIM information. ..
  • the conversion unit 102 has an imaging range of the imaging device 17 among the structures such as walls and columns included in the BIM information based on the current position and orientation of the imaging device 17 specified by the tracking unit 103 described later. Identify the structures contained in. Then, the conversion unit 102 specifies the three-dimensional coordinates (world coordinates) of the structure included in the imaging range of the imaging device 17 from the imaging device 17 from the BIM information. As an example, the conversion unit 102 acquires the world coordinates of any one of the buildings represented by the BIM information from an external device or the like, and includes the building represented by the BIM information with the one point as a reference. The three-dimensional coordinates in the BIM information of each point are converted into world coordinates.
  • the method of obtaining the world coordinates of each point included in the building from the BIM information is not limited to this.
  • the conversion unit 102 sends the specified three-dimensional coordinates (world coordinates) to the bundle adjustment unit 104 as the initial value of the three-dimensional coordinates (world coordinates) of the points in the three-dimensional space in the bundle adjustment process described later.
  • the details of the bundle adjustment process will be described later.
  • the three-dimensional coordinates (world coordinates) of the points included in the building 9 specified by the conversion unit 102 from the BIM information are examples of information regarding the positions of surrounding objects in the present embodiment.
  • the three-dimensional coordinates in the present embodiment are world coordinates.
  • the conversion unit 102 may specify the range of the initial value without specifying the initial value as a unique value.
  • the conversion unit 102 may provide a range instead of specifying the three-dimensional coordinates of a point in the three-dimensional space as unique coordinates.
  • a three-dimensional space region that is likely to include a certain point included in the structure included in the imaging range of the imaging device 17 is sent to the bundle adjusting unit 104 as a range of initial values.
  • the initial value or the range of the initial value is an example of the input value generated by the conversion unit 102 in the present embodiment.
  • the calibration means that the correspondence between the position in the BIM information and the position in the SLAM coordinate system is defined.
  • the position of the movement start point of the information processing device 1 in the BIM information in the three-dimensional coordinate system may be stored in the auxiliary storage device 14 as a reference point. Therefore, the conversion unit 102 can specify the position in the building represented by the BIM information corresponding to the position of the image pickup device 17 specified by the tracking unit 103.
  • the calibration of the 3D coordinate system and the SLAM coordinate system in the BIM information may be executed by an input operation by an administrator or the like, or an index such as an AR (Augmented Reality) marker installed in the building may be used as an index of the SLAM processing unit. It may be executed by recognizing 120 from the captured image.
  • an index such as an AR (Augmented Reality) marker installed in the building may be used as an index of the SLAM processing unit. It may be executed by recognizing 120 from the captured image.
  • the SLAM processing unit 120 simultaneously estimates the self-position and generates map information.
  • the self-position is the position and orientation of the information processing device 1.
  • the position and orientation of the image pickup device 17 represent the same as the position and posture of the information processing device 1.
  • the SLAM processing unit 120 corrects the displacement between the image pickup apparatus 17 and the center of the information processing apparatus 1 to provide information.
  • the self-position which is the position of the processing device 1, is estimated.
  • the map information represents the shape of the surrounding structure along the movement locus of the information processing device 1. More specifically, the map information of the present embodiment represents the internal structure of the building in which the information processing device 1 travels in three dimensions along the movement locus of the information processing device 1.
  • the map information of the present embodiment is, for example, a point cloud map in which the internal structure of the building in which the information processing device 1 travels is represented as a point cloud having three-dimensional coordinates.
  • the type of map information is not limited to this, and the map may be represented by a set of three-dimensional figures instead of a point cloud.
  • the map information is also called an environment map.
  • the SLAM processing unit 120 is an example of an estimation unit in this embodiment.
  • a method for estimating the self-position and generating map information a method other than SLAM may be adopted. Further, the estimation of the self-position and the generation of the map information do not have to be performed at the same time, and the other process may be executed after one process is completed first.
  • the generation of map information in the present specification includes at least one of newly generating map information, adjusting the generated map information, or updating the generated map information. It shall be.
  • the SLAM processing unit 120 includes a tracking unit 103 and a bundle adjustment unit 104.
  • the tracking unit 103 identifies the position and orientation of the image pickup device 17 by tracking a plurality of captured images captured by the image pickup device 17 at different times.
  • the tracking unit 103 is an example of a specific unit in the present embodiment.
  • the image pickup device 17 captures the surroundings while moving as the information processing device 1 moves.
  • the tracking unit 103 calculates changes in the position and posture of the image pickup device 17 by tracking points drawn on a certain captured image on another captured image captured at different times.
  • the tracking unit 103 specifies the current position and orientation of the imaging device 17 by adding changes in the position and orientation specified by the tracking process to the position and orientation of the imaging device 17 at the start of imaging.
  • FIG. 3 is an image diagram showing an example of tracking processing according to the first embodiment.
  • the reference frame 41 and the target frame 42 are captured images captured at different times by the imaging device 17.
  • the reference frame 41 is an captured image captured before the target frame 42, and the imaging device 17 changes from the position Ti at the time when the reference frame 41 is imaged to the position T j at the time when the target frame 42 is imaged. It is assumed that it has moved.
  • the reference frame 41 is also referred to as a key frame
  • the target frame 42 is also referred to as a current frame.
  • the tracking unit 103 P point which is depicted in the reference frame 41, by calculating the photometry error in the case where it is depicted in the target frame 42, the image pickup device 17 is moved from the position T i to the position T j Calculate the relative amount of movement in the case.
  • the point P is, for example, a feature point on the reference frame 41.
  • the movement of the image pickup apparatus 17 includes both a change in the position of the image pickup apparatus 17 and a change in the posture (orientation).
  • the position T i at the time of the reference frame 41 and the image pickup is assumed to be already error-corrected.
  • the point 50a shown in FIG. 3 represents the position where the point P drawn on the reference frame 41 is back-projected on the three-dimensional space.
  • the tracking unit 103 calculates the photometric error Epj between the reference frame 41 and the target frame 42 by using the following equation (1).
  • I i represents the reference frame 41 and I j represents the target frame 42.
  • N p is a neighborhood pattern of pixels including the point P on the reference frame 41.
  • t i is the exposure time of the reference frame 41
  • t j represents the exposure time of the target frame 42.
  • p' is due to the inverse depth d p, a projected point P in the target frame 42.
  • the tracking unit 103 calculates the photometric error Epj using the Huber norm.
  • the weighting coefficient W p is calculated in advance based on the brightness gradient of the pixels. For example, with respect to the gradient is larger pixel by reducing the value of the weight factor W p, it is possible to reduce noise.
  • the luminance conversion hyperparameter a i, a j, b i , b j is a parameter for converting the luminance of the reference frame 41 and the subject frame 42.
  • Luminance conversion hyperparameter a i, a j, b i , b j may be tuned manually for example by the administrator.
  • the following equation (2) is a constraint condition of the point P'which is the projection point of the point P used in the equation (1).
  • a back projection function that back-projects the point P drawn on the reference frame 41 as a point 50a on the three-dimensional space and a projection function that projects the point 50a on the three-dimensional space onto the target frame 42. And are used.
  • the distance from the point P to the point 50a is the depth (d p ) of the point 50a in the reference frame 41.
  • the coefficient R included in the equation (2) represents the amount of rotation of the image pickup apparatus 17.
  • the coefficient t represents the translational amount of the imaging device 17.
  • the coefficient R and the coefficient t are defined by the relative position of the image pickup apparatus 17 according to the following constraint condition (3).
  • the tracking unit 103 takes an image at the time when the target frame I j is imaged by solving the model of the photometric error Epj between the reference frame I i and the target frame I j shown in the above equations (1) to (3).
  • the position Tj of the device 17 is specified.
  • the position Ti and the position T j shown in the equation (3) and FIG. 3 include the position and orientation of the image pickup apparatus 17. In this way, the tracking unit 103 tracks changes in the position and posture of the imaging device 17 by repeatedly executing such tracking processing on a plurality of captured images captured in time series by the imaging device 17. do.
  • the tracking method is not limited to the above example.
  • the tracking method is an indirect method (indirect method) in which the position and orientation of the image pickup device 17 at the time of imaging of each frame are acquired by acquiring the feature points on the captured image and then solving the matching problem of the feature points.
  • the Direct method direct method in which the position and orientation of the image pickup apparatus 17 at the time of imaging of each frame are estimated by directly estimating the conversion between the captured images without the feature point extraction process.
  • the movement of the position and the posture of the image pickup apparatus 17 is calculated by projecting the feature points, but the tracking unit 103 may execute the tracking by the direct method. Further, the tracking unit 103 may specify the position and orientation of the image pickup device 17 in consideration of not only the captured image but also the detection result of the IMU sensor 18.
  • the tracking unit 103 sends the current position and orientation of the specified imaging device 17 to the bundle adjustment unit 104 and the conversion unit 102.
  • the bundle adjustment unit 104 corrects the position and orientation of the image pickup device 17 specified by the tracking unit 103 and the position information of surrounding objects by the bundle adjustment process.
  • the bundle adjustment unit 104 outputs the self-position of the information processing device 1 and the map information as the processing result.
  • the bundle adjustment unit 104 sends the captured image captured by the image pickup device 17 to the bundle adjustment unit 104 that minimizes the reprojection error for each frame.
  • the bundle adjustment unit 104 optimizes the world coordinate points (three-dimensional position coordinates) of each point in the surrounding environment, the position and orientation of the image pickup device 17, and the internal parameters of the image pickup device 17, respectively. Minimize frame reprojection error.
  • the internal parameters of the imaging device 17 do not have to be updated by the bundle adjustment unit 104 if the camera has been calibrated in advance.
  • the internal parameters of the image pickup device 17 are, for example, the focal length and the principal point. In bundle adjustment, the position and orientation of the imaging device 17 are also referred to as external parameters.
  • the bundle adjustment unit 104 of the present embodiment uses the three-dimensional coordinates indicating the positions of the surrounding structures converted from the BIM information by the conversion unit 102 as the initial values of the world coordinate points of each point in the surrounding environment as described above. adopt.
  • the bundle adjustment unit 104 adjusts the error of the position and orientation of the image pickup device 17 specified by the tracking unit 103 by this bundle adjustment.
  • the bundle adjustment unit 104 obtains the world coordinate points of each point at which the reprojection error is minimized by the bundle adjustment, the position and orientation of the image pickup device 17, and the internal parameters of the image pickup device 17, so that the error is reduced as a result.
  • the position and orientation of the image pickup device 17 are calculated.
  • the set of world coordinate points after bundle adjustment becomes map information.
  • FIG. 4 is an image diagram showing an example of the positional relationship between the information processing device 1 and surrounding objects according to the first embodiment.
  • the information processing device 1 is assumed to move in the building 9 in which the pillars 90a to 90c are installed. Pillars 90a to 90c are examples of objects.
  • the distance d in FIG. 4 is the distance from the image pickup device 17 to the point 52 on the plane 901 facing the information processing device 1 of the pillar 90c.
  • the conversion unit 102 specifies the initial value of the three-dimensional coordinates of the point 52 on the plane 901.
  • the bundle adjustment unit 104 starts the adjustment process from the initial value, and based on the position and orientation of the image pickup device 17 specified by the tracking unit 103 and the captured image, the self-position and the position of the point 52 Adjust the error.
  • the conversion unit 102 corrects the three-dimensional coordinates of the point 52 by adjusting the error between the self-position and the position of the point 52, and obtains the three-dimensional coordinates with higher accuracy.
  • the bundle adjustment unit 104 estimates the self-position and the three-dimensional coordinates of the point 52.
  • the bundle adjustment unit 104 changes the initial value based on the BIM information by the bundle adjustment, thereby causing the BIM.
  • the position of an object that is not included in the information can also be estimated.
  • FIG. 5 is an image diagram showing an example of bundle adjustment according to the first embodiment.
  • the bundle adjustment unit 104 captures the projection points 401a and 401b on which the points 52 in the three-dimensional space are projected on the two captured images 43 and 44 shown in FIG. 5 by the following equation (4).
  • the position of the image pickup apparatus 17 and the three-dimensional coordinates of the point 52 are estimated so as to minimize the error from the feature points 402a and 402b corresponding to the points 52 drawn on the images 43 and 44.
  • the captured image 43 and the captured image 44 are distinguished, the captured image 43 is referred to as a first image and the captured image 44 is referred to as a second image for convenience.
  • the internal parameters of the image pickup apparatus 17 are assumed to have been calibrated in advance, and are not included in the parameters to be optimized in the equation (4).
  • the initial value generated by the conversion unit 102 described above as an initial value in the world coordinate point (X i point in the three-dimensional space) shown as point 52, it is used in equation (4). Further, in FIG. 5, the lines connecting the reference points 170a and 170b representing the position of the image pickup apparatus 17 and the points 52 are referred to as ray bundles (Bundle) 6a and 6b. Also, when the range of the initial value is set by the converter 102, the equation (4), the point X i on the three-dimensional space, calculation is started world coordinate included in the range is set .. In the case where the error in the calculation of the range of the initial values is not minimized, beyond the scope of the initial value, the value of the point X i on the optimum three-dimensional space may be determined.
  • the bundle adjusting unit 104 estimates the position of the plane or the curved surface of the surrounding object based on the BIM information, and based on the constraint condition that a plurality of points existing in the surroundings are located on the plane or the curved surface.
  • the distance from the image pickup device 17 to the surrounding objects is calculated.
  • the points 50b to 50d shown in FIG. 4 all exist on the plane 901.
  • the bundle adjustment unit 104 imposes a constraint condition by the equation of a plane when executing the bundle adjustment process based on the captured image with the plane 901 as the imaging range.
  • points 50a to 50d in the three-dimensional space are not particularly limited, they are simply referred to as points 50.
  • the bundle adjustment unit 104 solves the optimization problem by the nonlinear least squares method by the nonlinear functions f (x) and g (x), thereby presenting a plane.
  • the position of the point 50 and the position of the imaging device 17 are estimated with the above as a constraint condition.
  • the function f (x) corresponds to the above equation (4).
  • the penalty method or the extended Lagrange method can be applied, but other solutions may be adopted.
  • the structure around the information processing device 1 may have a curved surface as well as a flat surface.
  • the outer surface of the pillar 90b is a curved surface.
  • the bundle adjustment unit 104 may impose a constraint condition by a curved surface equation so that a point on a three-dimensional space is on a curved surface based on BIM information.
  • the bundle adjustment unit 104 generates a point cloud having the three-dimensional coordinates as map information based on the three-dimensional coordinates of the plurality of points 50 after the bundle adjustment. Further, the bundle adjustment unit 104 updates the map information by adding or deleting a new point 50 to the map information. In addition, the bundle adjustment unit 104 may adjust the self-position estimation result and the map information in consideration of the detection result of the IMU sensor 18.
  • the bundle adjustment unit 104 calculates the positions of surrounding objects as the spatial coordinates of the plurality of points 50 in the three-dimensional space, and outputs the calculated spatial coordinates of the plurality of points 50 as map information. ..
  • the term "output" includes storage in the auxiliary storage device 14 or transmission to the external device 2.
  • the bundle adjustment unit 104 stores the estimated self-position and the generated map information in the auxiliary storage device 14. Further, the bundle adjustment unit 104 may transmit the estimated self-position and the generated map information to the external device 2.
  • the movement control unit 105 moves the information processing device 1 by controlling the movement device 16. For example, the movement control unit 105 searches for a movable route based on the map information stored in the auxiliary storage device 14 and the current self-position. The movement control unit 105 controls the movement device 16 based on the search result.
  • the movement control unit 105 moves to avoid obstacles based on the detection results of obstacles or the like by these sensors. A route may be generated.
  • the movement control method of the information processing device 1 is not limited to these, and various autonomous movement methods can be applied.
  • FIG. 6 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the first embodiment.
  • the acquisition unit 101 acquires BIM information from the external device 2 (S1).
  • the acquisition unit 101 stores the acquired BIM information in the auxiliary storage device 14.
  • the movement control unit 105 starts the movement of the information processing device 1 by controlling the movement device 16 (S2).
  • the acquisition unit 101 acquires an captured image from the imaging device 17.
  • the acquisition unit 101 acquires sensing results such as angular velocity and acceleration from the IMU sensor 18 (S3).
  • the tracking unit 103 identifies the current position and orientation of the image pickup device 17 based on the captured image (S4).
  • the conversion unit 102 generates an initial value of the three-dimensional coordinates of the point in the structure from the BIM information based on the current position and orientation of the image pickup device 17 specified by the tracking unit 103 (S5).
  • the bundle adjustment unit 104 executes the bundle adjustment process (S6). Specifically, the initial values of the three-dimensional coordinates of the points in the structure around the image pickup device 17 generated from the BIM information, the position and orientation of the image pickup device 17 specified by the tracking unit 103, and the captured image. Based on the above, the distance from the image pickup device 17 to the surrounding object is calculated, and the position and orientation of the image pickup device 17 and the three-dimensional coordinates of the surrounding object are estimated. In addition, the bundle adjustment unit 104 generates map information based on the estimated three-dimensional coordinates of surrounding objects.
  • the bundle adjustment unit 104 stores the estimated self-position and the generated map information in, for example, the auxiliary storage device 14.
  • the movement control unit 105 searches for a movement route based on the map information stored in the auxiliary storage device 14 and the current self-position, and controls the movement device 16 based on the search result to obtain an information processing device. Move 1
  • the movement control unit 105 determines whether or not to end the movement of the information processing device 1 (S7).
  • the movement control unit 105 determines, for example, that when the information processing device 1 arrives at a predetermined end point, the movement control unit 105 ends the movement of the information processing device 1.
  • the conditions for determining the end of movement are not particularly limited. For example, when the movement control unit 105 receives an instruction to end movement from the outside via the communication network 3, the information processing device 1 It may be determined that the movement of is completed.
  • the information processing device 1 of the present embodiment executes self-position estimation and map information generation based on the BIM information and the captured image captured around the information processing device 1. Therefore, according to the information processing apparatus 1 of the present embodiment, by using the BIM information for the processing of self-position estimation and map information generation, it is possible to improve the self-position estimation and the accuracy of the map information. ..
  • the information processing apparatus 1 of the present embodiment converts BIM information into at least one input value of self-position estimation processing by SLAM processing unit 120 or map information generation processing, and based on the input value, By executing self-position estimation and map information generation, self-position estimation and map information accuracy are higher than self-position estimation and map information generation based only on peripheral detection results such as captured images. Can be improved.
  • the information processing device 1 of the present embodiment identifies the position and orientation of the image pickup device 17 by tracking a plurality of captured images captured at different times, and from the image pickup device 17 based on the BIM information to a surrounding object.
  • the information processing device 1 of the present embodiment uses the distance from the image pickup device 17 based on the BIM information to the surrounding object as the initial value or the range of the initial value in the bundle adjustment process.
  • the distance from the image pickup device 17 based on the BIM information to the surrounding object is the distance from the image pickup device 17 to the surrounding object in three-dimensional coordinates.
  • the initial value of the three-dimensional coordinates of a point in the three-dimensional space may be assumed to be infinite.
  • the bundle adjustment process starts without specifying whether the distance between the point in the three-dimensional space and the image pickup device is 1 m or 1000 m, so that the amount of calculation until the calculation result converges. May increase.
  • the information processing apparatus 1 of the present embodiment uses the initial value based on the BIM information, the processing result can be converged with a small amount of calculation.
  • the information processing apparatus 1 of the present embodiment estimates the position of the plane or the curved surface of the surrounding object based on the BIM information, and the constraint condition that a plurality of points existing in the surroundings are located on the plane or the curved surface.
  • the distance from the image pickup apparatus 17 to the surrounding objects is calculated based on the above. Therefore, according to the information processing apparatus 1 of the present embodiment, the amount of calculation can be reduced as compared with the case where the positions of a plurality of points existing on the same plane or curved surface are separately obtained.
  • the information processing device 1 of the present embodiment calculates the positions of surrounding objects as the spatial coordinates of the plurality of points 50 in the three-dimensional space, and outputs the calculated spatial coordinates of the plurality of points 50 as map information. do. According to the information processing device 1 of the present embodiment, more accurate map information can be provided by outputting the positions of surrounding objects calculated by the bundle adjustment process using BIM information as map information.
  • the information processing device 1 may be a robot or the like having functions such as monitoring, security, cleaning, and delivery of luggage. In this case, the information processing device 1 realizes various functions by moving the building 9 based on the estimated self-position and map information. Further, the map information generated by the information processing device 1 may be used not only for generating the movement route of the information processing device 1 itself, but also for monitoring or managing the building 9 from a remote location. Further, the map information generated by the information processing device 1 may be used to generate a movement route of a robot or drone other than the information processing device 1.
  • the image pickup device 17 is not limited to the stereo camera.
  • the image pickup device 17 may be an RGB-D camera having an RGB (Red Green Blue) camera and a three-dimensional measurement camera (Dept camera), a monocular camera, or the like.
  • the sensor included in the information processing device 1 is not limited to the IMU sensor 18, and a gyro sensor, an acceleration sensor, a magnetic sensor, or the like may be individually provided.
  • the SLAM processing unit 120 executes the image SLAM (Visual SLAM) using the captured image, but the SLAM that does not use the captured image may be adopted.
  • the information processing device 1 may detect surrounding structures by Lidar (Light Detection and Ringing or Laser Imaging Detection and Ringing) or the like instead of the image pickup device 17.
  • the SLAM processing unit 120 may specify the position and orientation of the information processing device 1 based on the distance measurement result by Lidar.
  • the SLAM processing unit 120 is supposed to generate three-dimensional map information, but it may be possible to generate two-dimensional map information.
  • the equations (1) to (6) illustrated in the present embodiment are examples, and the mathematical expressions used in the tracking process or the bundle adjustment process are not limited to these.
  • the bundle adjustment unit 104 may perform bundle adjustment according to the equation (4) without imposing the constraint conditions according to the equations (5) and (6). Further, in the tracking process, without using the proximity pattern N p, it may perform the tracking process.
  • the SLAM processing unit 120 may estimate its own position and generate map information by a method other than tracking processing or bundle adjustment processing.
  • the tracking unit 103 is used as an example of the specific unit, but a method of specifying a change in the position and posture of the information processing device 1 by a method other than tracking may be adopted.
  • various processes for improving the accuracy of self-position estimation or map information may be added to the SLAM process.
  • the SLAM processing unit 120 may further execute a loop closing process or the like.
  • a part or all of the information processing device 1 in the above-described embodiment may be composed of hardware, or may be composed of information processing of software (program) executed by a CPU, GPU, or the like. good.
  • software that realizes at least a part of the functions of each device in the above-described embodiment is a flexible disk, a CD-ROM (Compact Disc-Read Only Memory), a USB memory, or the like.
  • the software may process information by storing it in a non-temporary storage medium (non-temporary computer-readable medium) and loading it into a computer.
  • the software may be downloaded via a communication network.
  • information processing may be executed by hardware by implementing the software in a circuit such as an ASIC or FPGA.
  • the type of storage medium that stores the software is not limited.
  • the storage medium is not limited to a removable one such as a magnetic disk or an optical disk, and may be a fixed storage medium such as a hard disk or a memory. Further, the storage medium may be provided inside the computer or may be provided outside the computer.
  • the information processing device 1 includes one component, but may include a plurality of the same components.
  • software is installed on a plurality of computers, and each of the plurality of computers executes the same or different part of the processing of the software. You may. In this case, it may be a form of distributed computing in which each computer communicates via a network interface 13 or the like to execute processing. That is, the information processing device 1 in the above-described embodiment may be configured as a system that realizes a function by executing instructions stored in one or a plurality of storage devices by one or a plurality of computers. Further, the information transmitted from the terminal may be processed by one or a plurality of computers provided on the cloud, and the processing result may be transmitted to the terminal.
  • Various operations of the information processing device 1 in the above-described embodiment may be executed in parallel processing by using one or a plurality of processors or by using a plurality of computers via a network. Further, various operations may be distributed to a plurality of arithmetic cores in the processor and executed in parallel processing. In addition, some or all of the processes, means, etc. of the present disclosure may be executed by at least one of a processor and a storage device provided on the cloud capable of communicating with the information processing device 1 via the network. As described above, each device in the above-described embodiment may be in the form of parallel computing by one or a plurality of computers.
  • the information processing device 1 in the above-described embodiment may be realized by one or a plurality of processors 11.
  • the processor 11 may refer to one or more electronic circuits arranged on one chip, or may refer to one or more electronic circuits arranged on two or more chips or two or more devices. You may point. When a plurality of electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.
  • processors may be connected (combined) to one storage device (memory), or a single processor may be connected.
  • a plurality of storage devices (memory) may be connected (combined) to one processor.
  • the information processing device 1 in the above-described embodiment is composed of at least one storage device (memory) and a plurality of processors connected (combined) to the at least one storage device (memory), among the plurality of processors
  • At least one processor may include a configuration in which it is connected (combined) to at least one storage device (memory). Further, this configuration may be realized by a storage device (memory) and a processor included in a plurality of computers. Further, a configuration in which the storage device (memory) is integrated with the processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.
  • the external device 2 is not limited to the server device. Further, the external device 2 may be provided in a cloud environment. Further, the external device 2 may be used as an example of the information processing device in the claims.
  • the external device 2 may be an input device.
  • the device interface 15 may be connected not only to the mobile device 16, the image pickup device 17, and the IMU sensor 18, but also to the input device.
  • the input device is, for example, a device such as a camera, a microphone, a motion capture, various sensors, a keyboard, a mouse, or a touch panel, and gives the acquired information to the information processing device 1.
  • it may be a device including an input unit, a memory and a processor such as a personal computer, a tablet terminal, or a smartphone.
  • the external device 2 may be an output device.
  • the device interface 15 may be connected to the output device.
  • the output device may be, for example, a display device such as an LCD (Liquid Crystal Display), a CRT (Cathode Ray Tube), a PDP (Plasma Display Panel), or an organic EL (Electro Luminescence) panel, and outputs audio or the like. It may be a speaker or the like. Further, it may be a device including an output unit such as a personal computer, a tablet terminal, or a smartphone, a memory, and a processor.
  • the external device 2 may be a storage device (memory). Further, the device interface 15 may be connected to a storage device (memory).
  • the external device 2 may be a network storage or the like, and a storage such as an HDD may be connected to the device interface 15.
  • the external device 2 or the external device connected to the device interface 15 may be a device having some functions of the components of the information processing device 1 in the above-described embodiment. That is, the information processing device 1 may transmit or receive a part or all of the processing results of the external device 2 or the external device connected to the device interface 15.
  • the information processing device 1 may be constantly connected to the external device 2 via the communication network 3, but is limited to this. is not it.
  • the information processing device 1 may take the connection with the external device 2 offline while executing the self-position estimation process and the map information generation process.
  • the world coordinates of the points included in the building 9 specified by the conversion unit 102 from the BIM information are used as an example of the information regarding the positions of the surrounding objects, but the information regarding the positions of the surrounding objects is included in this. It is not limited.
  • the conversion unit 102 may generate information indicating the distance between the information processing device 1 and a surrounding object based on the BIM information and the position and orientation of the image pickup device 17.
  • the position and orientation of the image pickup apparatus 17 for example, the position and orientation specified by the tracking unit 103 can be adopted.
  • position of the imaging device 17 (R j, t j), the distance between the (R j + 1, t j + 1) is, the conversion unit 102 It is identified by information that indicates the distance taken.
  • the bundle adjustment unit 104 determines the position (R j , t j ) of the point X i imaging device 17 in the three-dimensional space specified by the information indicating the distance generated by the conversion unit 102. Set a value for each parameter so that it matches the distance between (R j + 1 , t j + 1). Also in this method, the position of a point X i and the imaging device 17 in a three-dimensional space obtained result of the adjustment of the error by bundle adjustment may be different from the result obtained from the BIM data.
  • the conversion unit 102 may specify the dimension of the building 9 from the BIM information, and the dimension may be used as an example of information regarding the position of a surrounding object.
  • the bundle adjustment unit 104 can reduce the amount of calculation by performing bundle adjustment using the information regarding the position of the surrounding object converted from the BIM information by the conversion unit 102.
  • the environmental information is three-dimensional design information such as BIM information or 3D CAD data.
  • the environmental information includes at least one of the entry / exit information of the person in the building 9 and the image recognition result of the person in the captured image.
  • the environmental information may include both the entry / exit information of the person in the building 9 and the image recognition result of the person in the captured image, or may include either one.
  • FIG. 7 is a block diagram showing an example of the functions included in the information processing device 1 according to the second embodiment.
  • the information processing apparatus 1 of the present embodiment includes an acquisition unit 1101, a conversion unit 1102, a SLAM processing unit 1120, and a movement control unit 105.
  • the SLAM processing unit 1120 includes a tracking unit 1103 and a bundle adjusting unit 1104.
  • the conversion unit 1102 includes an initial value generation unit 106 and a mask information generation unit 107.
  • the movement control unit 105 has the same function as that of the first embodiment.
  • the acquisition unit 1101 of the present embodiment has the same function as that of the first embodiment, and acquires the entry / exit information of the person in the building 9.
  • the entry / exit information is information indicating the number of people entering / exiting each room or floor of the building 9 and the time of entering / exiting.
  • a sensor for detecting entry / exit is installed at the entrance / exit of a room or floor of the building 9, and the detection result by the sensor is transmitted to the external device 2.
  • the acquisition unit 1101 acquires the entry / exit information from the external device 2.
  • the method of detecting the entry / exit of a person is not limited to the sensor.
  • the entry / exit information may be a reading record of a security card by a card reader or a detection result of a person from an image captured by a surveillance camera installed in a building 9.
  • the acquisition unit 1101 stores the acquired entry / exit information in the auxiliary storage device 14.
  • the conversion unit 1102 of the present embodiment has the same functions as those of the first embodiment, and is excluded from the target of map information generation in the building 9 where the information processing device 1 is located based on the environmental information. Generates mask information that represents.
  • the environmental information includes at least one of the entry / exit information and the image recognition result of the person.
  • the image recognition result of a person is a result of recognizing a person by image processing from the captured image captured by the image pickup device 17.
  • the environmental information of the present embodiment includes both the entry / exit information and the image recognition result, and the same three-dimensional design information as that of the first embodiment.
  • the conversion unit 1102 includes an initial value generation unit 106 and a mask information generation unit 107.
  • the initial value generation unit 106 has the same function as the conversion unit 102 in the first embodiment.
  • the mask information is information representing an area excluded from the target of map information generation.
  • the mask information generation unit 107 determines the area where the person is located in the building 9 based on the entry / exit information or the image recognition result of the person, and sets the determined area as an area to be excluded from the target of map information generation. ..
  • the mask information generation unit 107 recognizes a person by image processing from the captured image captured by the imaging device 17. Further, when it is difficult to determine whether or not the object depicted in the captured image is a person in the image processing, the mask information generation unit 107 may move to the room or floor where the captured image is captured based on the entry / exit information. , It is determined whether or not a person exists at the time when the captured image is captured. When the mask information generation unit 107 determines that a person exists in the room or floor where the captured image is captured at the time when the captured image is captured, the mask information generation unit 107 draws the captured image on the captured image rather than determining that the person does not exist. It is highly probable that the object is a person.
  • FIG. 8 is an image diagram showing an example of the positional relationship between the information processing device 1 and surrounding objects according to the second embodiment.
  • the person 70 in the building 9, the person 70 exists in the room where the information processing device 1 exists. Unlike the pillars 90a to 90c, the person 70 moves, so if the presence of the person 70 is included in the map information, the accuracy of the map information may decrease.
  • the mask information generation unit 107 generates mask information representing the area 80 in which the person 70 exists.
  • the mask information represents, for example, the area 80 in which the person 70 exists in three-dimensional coordinates.
  • the mask information generation unit 107 generates mask information using both the entry / exit information and the image recognition result of the person, but the mask information is generated based on only one of them. You may.
  • the mask information generation unit 107 detects a moving body such as a vehicle or an object such as a device temporarily existing in the building 9 such as a dolly from the image captured by the image pickup device 17 by image recognition. You may. In this case, the mask information generation unit 107 sets the area where it is determined that these objects exist as an area to be excluded from the target of map information generation.
  • the mask information generation unit 107 generates mask information representing an area to be excluded from the target of map information generation, and sends it to the SLAM processing unit 1120.
  • the SLAM processing unit 1120 of the present embodiment has the functions of the first embodiment and does not generate map information for the area corresponding to the mask information.
  • the tracking section 1103 of the SLAM processor 1120 of the present embodiment when performing the tracking process by the same equation (1) in the first embodiment, near the pattern N p is represented by the mask information If the area that is the image area to be rendered, with respect to the vicinity pattern N p, is multiplied by a mask value.
  • the mask value is, for example, “0” or “1”, but is not limited thereto.
  • the method of applying the mask is not limited to this, and other methods may be adopted.
  • the bundle adjustment unit 1104 of the SLAM processing unit 1120 of the present embodiment has the functions of the first embodiment, and the area corresponding to the mask information is excluded from the bundle adjustment.
  • FIG. 9 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the second embodiment.
  • the process of acquiring the BIM information in S1 is the same as that of the first embodiment described with reference to FIG.
  • the acquisition unit 1101 acquires the entry / exit information acquisition (S21).
  • the acquisition unit 1101 stores the acquired entry / exit information in the auxiliary storage device 14.
  • the process of acquiring the captured image of S3 and the sensing results such as angular velocity and acceleration from the process of starting the movement of the information processing device 1 of S2 is the same as that of the first embodiment.
  • the mask information generation unit 107 of the conversion unit 1102 of the present embodiment generates mask information representing an area to be excluded from the target of map information generation based on the entry / exit information or the image recognition result of the person 70 ( S22).
  • the tracking unit 1103 of the SLAM processing unit 1120 of the present embodiment identifies the current position and orientation of the image pickup device 17 based on the captured image (S4). At this time, the tracking unit 1103 excludes the area corresponding to the mask information from the tracking process.
  • the initial value generation unit 106 of the conversion unit 1102 of the present embodiment is based on the current position and orientation of the image pickup device 17 specified by the tracking unit 103, and is based on the BIM information in the structure around the image pickup device 17.
  • the initial value of the three-dimensional coordinates of the point is generated (S5). Since the area corresponding to the mask information is not subject to the generation of map information, the initial value generation unit 106 does not generate the initial value of the three-dimensional coordinates of the points in the structure in the area corresponding to the mask information. ..
  • the bundle adjustment unit 1104 executes the bundle adjustment process (S6).
  • the bundle adjustment unit 1104 excludes the area corresponding to the mask information from the bundle adjustment.
  • the process of determining whether or not to end the movement of the information processing device 1 in S7 is the same as that of the first embodiment.
  • the acquisition unit 1101 acquires the latest entry / exit information acquisition again (S23), and returns to the process of S3.
  • the information processing device 1 of the present embodiment generates mask information representing the area excluded from the target of map information generation in the building 9 based on the environmental information, and the area corresponding to the mask information is the area corresponding to the mask information. Does not generate map information. Therefore, according to the information processing device 1 of the present embodiment, it is possible to exclude things such as a person 70 and an object that temporarily exist, which may reduce the accuracy of the map information, so that the accuracy of the map information can be reduced. Can be improved.
  • the environmental information includes the entry / exit information of the person 70 in the building 9, or the image recognition result of the person 70 in the image captured by the image pickup device 17 mounted on the information processing device 1.
  • the information processing device 1 of the present embodiment determines an area where the person 70 is located in the building 9 based on the entry / exit information or the image recognition result of the person 70, and determines the determined area from the target of generating map information. The area to be excluded.
  • the information processing device 1 can improve the accuracy of the map information by not reflecting the worker in the map information. Further, with such a configuration, the information processing apparatus 1 of the present embodiment can robustly execute processing even when the surrounding environment changes depending on a person or the like.
  • the bundle adjustment process is performed based on the environmental information as in the first embodiment, but the information processing device 1 of the second embodiment is the same as that of the first embodiment. It does not have to have all the functions.
  • the information processing device 1 may use the environmental information only for generating the mask information and may not use it for the bundle adjustment process.
  • the environmental information does not have to include the three-dimensional design information.
  • the timing of using the mask information is limited to this. is not it.
  • mask information based on entry / exit information at a past time may be used.
  • FIG. 10 is a block diagram showing an example of the functions included in the information processing device 1 according to the third embodiment.
  • the information processing apparatus 1 of the present embodiment includes an acquisition unit 1101, a marker detection unit 108, a calibration unit 109, a conversion unit 2102, a SLAM processing unit 2120, and a movement control unit 105.
  • the SLAM processing unit 2120 includes a tracking unit 2103 and a bundle adjustment unit 1104.
  • the conversion unit 2102 includes an initial value generation unit 1106 and a mask information generation unit 1107.
  • the movement control unit 105 has the same functions as those of the first and second embodiments.
  • the acquisition unit 1101 has the same function as that of the second embodiment.
  • the marker detection unit 108 detects the AR marker from the captured image.
  • the AR marker has, for example, information on three-dimensional coordinates representing the position where the AR marker is described.
  • the three-dimensional coordinates are consistent with the coordinate system in the BIM information.
  • the AR marker is a coordinate system in BIM information and represents a position where the AR marker is installed.
  • the AR marker is an example of index information in this embodiment. It is assumed that the AR marker is installed on a wall, a pillar, or the like along the passage of the building 9. Specifically, the AR marker is, for example, a QR code (registered trademark) or the like, but is not limited thereto. The number of AR markers is not particularly limited, but it is assumed that a plurality of AR markers are installed per building 9. Further, the marker detection unit 108 is an example of the index detection unit in the present embodiment. The marker detection unit 108 sends the detection result of the AR marker to the calibration unit 109.
  • the calibration unit 109 Based on the detection result of the AR marker, the calibration unit 109 adjusts the coordinate system representing the self-position held internally so as to match the coordinate system of the BIM information.
  • the calibration unit 109 is an example of the coordinate adjustment unit in this embodiment.
  • the locus of change in self-position estimated by the SLAM processing unit 2120 is stored in the auxiliary storage device 14, but an error in self-position may be accumulated as the information processing device 1 moves.
  • the calibration unit 109 eliminates the accumulation of such errors by adjusting the current position of the information processing device 1 based on the three-dimensional coordinates represented by the AR marker detected by the marker detection unit 108.
  • the calibration unit 109 sends the calibration result to the conversion unit 2102.
  • the calibration unit 109 sends a conversion matrix for correcting the self-position to the conversion unit 2102.
  • the conversion unit 2102 of the present embodiment has the same functions as those of the first and second embodiments, and also provides environmental information to the SLAM processing unit 2120 based on the self-position adjusted by the calibration unit 109. Converts to the input value of the self-position estimation process or the map information generation process.
  • the initial value generation unit 1106 has the same function as that of the second embodiment, and is an imaging device specified by the tracking unit 2103 based on the self-position adjusted by the calibration unit 109.
  • the initial value generation unit 1106 specified the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the transformation matrix generated by the calibration unit 109.
  • the initial value of the bundle adjustment process or the input value representing the range of the initial value is generated.
  • the mask information generation unit 1107 has the same function as that of the second embodiment, and generates mask information based on the self-position adjusted by the calibration unit 109.
  • the mask information generation unit 1107 identifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the conversion matrix generated by the calibration unit 109. Above, generate mask information.
  • the bundle adjustment unit 1104 of the SLAM processing unit 2120 has the same functions as those of the first and second embodiments, and the bundle adjustment unit 2102 generates the bundle adjustment based on the self-position adjusted by the calibration unit 109. Use the initial value of processing or the range of initial values for bundle adjustment.
  • FIG. 11 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the third embodiment.
  • the process from the process of acquiring the BIM information of S1 to the process of acquiring the captured image and the sensing result of S3 is the same as that of the second embodiment.
  • the marker detection unit 108 detects the AR marker from the captured image (S31).
  • the marker detection unit 108 sends the detection result of the AR marker to the calibration unit 109.
  • the calibration unit 109 executes the calibration process based on the detection result of the AR marker (S32). For example, the calibration unit 109 generates a transformation matrix for adjusting the self-position of the BIM information in the coordinate system. The calibration unit 109 sends the generated transformation matrix to the conversion unit 2102.
  • the mask information generation unit 1107 identifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the conversion matrix generated by the calibration unit 109. , Generates mask information (S22).
  • the tracking process of S4 is the same as that of the first and second embodiments, but the calibration result by the calibration unit 109 may also be used in the process.
  • the calibration unit 109 sends the calibration result to the conversion unit 2102, but may further send the calibration result to the SLAM processing unit 2120.
  • the tracking unit 2103 of the SLAM processing unit 2120 executes the tracking process using the three-dimensional coordinates based on the calibration result.
  • the initial value generation unit 1106 specifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the transformation matrix generated by the calibration unit 109.
  • the initial value of the bundle adjustment process or the input value representing the range of the initial value is generated (S5).
  • the bundle adjustment unit 1104 may execute the tracking process and the bundle adjustment process using the three-dimensional coordinates based on the calibration result.
  • the information processing apparatus 1 of the present embodiment detects the index information whose position is represented by the coordinate system in the BIM information from the detection result of the captured image or the like, and is based on the coordinate system adjusted by the index information. Then, the environmental information is converted into an input value of the self-position estimation process by the SLAM processing unit 2120 or the map information generation process. Therefore, according to the information processing device 1 of the present embodiment, the error between the BIM information and the internal SLAM coordinate system of the information processing device 1 is reduced, and self-position estimation and map information generation can be performed with higher accuracy. can do.
  • the AR marker is illustrated as the index information, but the index information is not limited to this.
  • the index information may be a sign or the like that can be captured by Lidar or various sensors, or may be a beacon or the like.
  • the information processing device 1 is described as having the functions of both the first embodiment and the second embodiment, but the information processing device 1 of the present embodiment is described as having the functions of both the first embodiment and the second embodiment. It is not necessary to have all the functions of the second embodiment.
  • the information processing apparatus 1 may use the environmental information only for either bundle adjustment or generation of mask information. Further, the environmental information may include any of three-dimensional design information, entry / exit information, and image recognition result of a person.
  • the information processing apparatus 1 uses the captured image for recognizing a person, but the use of the captured image is not limited to this.
  • the information processing apparatus 1 segmentes the captured image based on the recognition result of the object drawn on the captured image, and performs SLAM processing based on the segmentation result.
  • the information processing device 1 of the present embodiment includes an acquisition unit 101, a conversion unit 102, a SLAM processing unit 120, and a movement control unit 105.
  • the acquisition unit 101 has the same function as that of the first embodiment. Specifically, the acquisition unit 101 acquires an captured image from the imaging device 17 via the device interface 15.
  • the conversion unit 102 has the same function as that of the first embodiment, and then segments the captured image acquired by the acquisition unit 101 based on the recognition result of the object drawn on the captured image.
  • FIG. 12 is a diagram showing an example of segmentation of the captured image 60 according to the fourth embodiment.
  • the captured image 60 depicts the environment around the information processing device 1.
  • the conversion unit 102 recognizes the image area in which the object is drawn and the type of each object from the captured image 60.
  • the recognition result of the object is information in which the two-dimensional coordinates of the image area in which the object is drawn and the type of each object are associated with each other.
  • the environmental information includes at least the captured image 60.
  • the segmentation result of the captured image 60 may be used as an example of the environmental information instead of the captured image 60 itself.
  • the conversion unit 102 recognizes the object depicted in the captured image 60.
  • object includes structures such as walls and pillars, furniture, furniture, moving objects, temporary objects, people, and the like.
  • the conversion unit 102 recognizes the individual objects drawn on the captured image 60 by inputting the captured image 60 into the trained model configured by, for example, a neural network or the like.
  • the captured image 60 depicts a person 70, boxes 75a and 75b, a pillar 90, a wall 91, and a floor 92.
  • the conversion unit 102 recognizes these objects.
  • "People,” “boxes,” “pillars,” “walls,” and “floors” are examples of object types.
  • the recognition of the person 70 and the recognition of other objects may be executed separately.
  • the conversion unit 102 segmentes the captured image 60 based on the recognition result of the object. On the right side of FIG. 12, the segmentation result 61 of the captured image 60 is shown. In the example shown in FIG. 12, the conversion unit 102 sets the image area where the pillar 90 and the wall 91 are drawn as the area A1, the image area where the floor 92 is drawn, the area A2, and the image area where the boxes 75a and 75b are drawn. The image region in which the area A3 and the person 70 are drawn is set as the area A4, and the captured image 60 is segmented into four.
  • the division unit is not limited to the example shown in FIG. Hereinafter, when the regions A1 to A4 are not particularly distinguished, they are simply referred to as regions A.
  • the conversion unit 102 classifies the recognized objects, that is, the person 70, the boxes 75a and 75b, the pillar 90, the wall 91, and the floor 92 according to whether or not they are permanent objects.
  • the pillar 90, the wall 91, and the floor 92 are permanent objects because they are part of the building 9.
  • the person 70 and the boxes 75a and 75b are non-permanent objects. Whether or not each object is permanently installed is determined by, for example, a trained model.
  • a permanently installed object is an object that does not move from the installation position once it is installed.
  • an object that is a part of the building 9, such as the pillar 90, the wall 91, and the floor 92, is basically a permanent object because it does not move.
  • An object that is not permanently installed is an object that is likely to move from the installation position.
  • a person 70, a moving body such as a cart or a forklift, temporarily installed fixtures, luggage boxes 75a, 75b, etc. are non-permanent objects.
  • the conversion unit 102 associates the segmented areas A1 to A4 with whether or not the object drawn in each area is a permanently installed object.
  • the information in which the segmented areas A1 to A4 are associated with whether or not the object drawn in each area is a permanently installed object is referred to as a segmentation result.
  • the conversion unit 102 sends the segmentation result to the SLAM processing unit 120.
  • the object recognition and the segmentation process based on the result of the object recognition have been described separately, but these processes may be integrated.
  • a trained model that outputs the segmentation result of the captured image 60 when the captured image 60 is input may be adopted.
  • the conversion unit 102 inputs the captured image 60 into the trained model and obtains the segmentation result output from the trained model.
  • the method of object recognition from the captured image 60 and the segmentation of the captured image 60 is not limited to the above example.
  • the conversion unit 102 may apply a machine learning or deep learning technique other than the neural network to perform object recognition from the captured image 60 and segmentation of the captured image 60.
  • the SLAM processing unit 120 of the present embodiment has the functions of the first embodiment, and then estimates the self-position and generates map information based on the segmentation result of the captured image 60.
  • the SLAM processing unit 120 identifies a three-dimensional space corresponding to the areas A1 and A2 in which the permanent object is drawn in the captured image 60, and uses the three-dimensional space for self-position estimation processing and map information.
  • Target of generation processing Further, the SLAM processing unit 120 identifies a three-dimensional space corresponding to the areas A3 and A4 in which a non-permanent object is drawn in the captured image 60, and estimates the self-position and map information in the three-dimensional space. Exclude from the target of the generation process of. In this case, the information representing the areas A3 and A4 in which the non-permanent object is drawn may be used as the mask information representing the area to be excluded from the target of generating the map information.
  • the SLAM processing unit 120 uses the regions A1 and A2 in which the permanent object is drawn in the captured image 60 for SLAM processing, and the region A3 in which the non-permanent object is drawn in the captured image 60. , A4 may not be used for SLAM processing.
  • the weighting at the time of SLAM processing may be changed depending on whether or not the objects drawn in each of the areas A1 to A4 are permanent objects.
  • the weighting coefficient of the regions A1 and A2 in which the permanent object is drawn is larger than the weighting coefficient of the regions A3 and A4 in which the non-permanent object is drawn in the captured image 60.
  • the weighting coefficients are set in each of the regions A1 to A4 so as to be.
  • the SLAM processing unit 120 may change the weighting coefficient for each type of object in the area A in which the non-permanent object is drawn. For example, there are an object that is relatively likely to be installed at the same position for a long period of time and an object that is relatively unlikely to be installed even if the object is not permanently installed.
  • the SLAM processing unit 120 is classified as a non-permanent object because, for example, large furniture and furniture may move, but it stays in the same position for a long period of time as compared with the person 70 and the like. It is likely to be installed. Therefore, the SLAM processing unit 120 sets the weighting coefficient so that the weighting coefficient becomes larger as the area A in which the object having a low possibility of movement is drawn out of the area A in which the non-permanent object is drawn. May be.
  • the weighting coefficient for each area A1 to A4 may be set by the conversion unit 102 instead of the SLAM processing unit 120.
  • the information processing device 1 of the present embodiment segments the captured image 60 captured by the imaging device 17 based on the recognition result of the object drawn on the captured image 60, and based on the result of the segmentation. , Performs self-position estimation and map information generation. Therefore, according to the information processing apparatus 1 of the present embodiment, in addition to the effect of the first embodiment, whether it is used for estimating the self-position and generating map information according to the object drawn on the captured image 60. Whether or not, or the strength of the influence on the estimation of the self-position and the generation of the map information can be adjusted, so that the accuracy of the estimation of the self-position and the accuracy of the map information can be improved.
  • the SLAM processing unit 120 described above may execute self-position estimation and map information generation based on the segmentation result of the captured image 60 and the three-dimensional design information such as BIM information.
  • the SLAM processing unit 120 refers to the three-dimensional design information, and the object is included in the design of the building 9. Judge whether or not. If the object determined not to be a permanent object based on the captured image 60 is not registered in the three-dimensional design information, the SLAM processing unit 120 adopts the determination result that the object is not a permanent object as it is. Further, when the object determined to be not a permanent object based on the captured image 60 is registered in the three-dimensional design information, the SLAM processing unit 120 determines that the object is not a permanent object. Is changed to the judgment result that is a permanent object.
  • the conversion unit 102 may determine the high accuracy of determining whether or not the object depicted in the captured image 60 is a permanent object by a percentage or the like.
  • the SLAM processing unit 120 refers to the three-dimensional design information when the accuracy of determining whether or not the object depicted in the captured image 60 is a permanent object is equal to or less than the reference value, and the object is the building 9 It may be determined whether or not it is included in the design of.
  • the reference value of the determination accuracy is not particularly limited.
  • the process of comparing the three-dimensional design information with the image recognition result may be executed by the conversion unit 102 instead of the SLAM processing unit 120.
  • the accuracy of the determination result as to whether or not the object drawn in each region A is a permanent object can be determined. Can be improved.
  • the SLAM processing unit 120 is either one of the segmentation result of the captured image 60 in the present embodiment, the entry / exit information of the person in the second embodiment described above, or the image recognition result of the person 70 in the captured image. Both may be used in combination.
  • Modification example 1 In the first embodiment described above, the initial value of the three-dimensional coordinates of the points in the surrounding object in the SLAM process is obtained based on the three-dimensional design information such as BIM information. In this modification, the three-dimensional coordinates of the points calculated from the distance values estimated from the captured image 60 captured around the information processing device 1 are used as the initial values of the three-dimensional coordinates of the points in the surrounding objects in the SLAM process. Adopt as.
  • the conversion unit 102 of the information processing device 1 estimates the distance (depth) between the object drawn on the captured image 60 and the imaging device 17 based on the captured image 60.
  • the estimation process is called a depth estimation process.
  • the environmental information includes at least the captured image 60.
  • the distance information estimated from the captured image 60 may be used as an example of the environmental information instead of the captured image 60 itself.
  • the conversion unit 102 is based on the stereoscopic difference of the captured image captured by one camera included in the stereo camera. , Calculate the depth.
  • the imaging device 17 may be a monocular camera.
  • the conversion unit 102 executes the depth estimation process by using the technique of machine learning or deep learning. For example, when the conversion unit 102 inputs the captured image 60 captured by the monocular camera, the conversion unit 102 outputs the depth map corresponding to the captured image 60. The distance between them may be estimated.
  • the trained model in this modification is, for example, a model that estimates the depth from a monocular image by estimating an image paired with the monocular image from the monocular image using a stereo image as training data.
  • the method of estimating the depth from a monocular image is not limited to this.
  • the conversion unit 102 sends the distance estimated from the captured image 60 to the SLAM processing unit 120 as an input value for SLAM processing. More specifically, the conversion unit 102 sets the three-dimensional coordinates of the points estimated from the distance estimated from the captured image 60 as the initial values in the bundle adjustment process by the bundle adjustment unit 104. The conversion unit 102 may specify the range of the initial value without specifying the initial value as a unique value.
  • the conversion unit 102 or the SLAM processing unit 120 processes the distance estimated from the captured image 60 with the image pickup device 17. The correction is made based on the deviation of the position from the center of the device 1.
  • the bundle adjustment process and the like can be calculated even if there is no three-dimensional design information. The amount can be reduced.
  • the conversion unit 102 is based on both the distance between the object and the image pickup device 17 estimated from the captured image 60 and the distance between the information processing device 1 and the object calculated from the three-dimensional design information. You may generate an input value for the distance to the object. For example, the conversion unit 102 may use the three-dimensional coordinates of the point obtained from the average of the distance estimated from the captured image 60 and the distance calculated from the three-dimensional design information as the initial value in the bundle adjustment process.
  • the SLAM processing unit 120 generates a point cloud map, but the mode of the three-dimensional representation is not limited to the point cloud map.
  • the SLAM processing units 120, 1120, and 2120 may generate a set of a plurality of figures having three-dimensional coordinates as map information.
  • FIG. 13 is a diagram showing an example of map information according to the modified example 2.
  • the map information 500 shown in FIG. 13 is obtained by applying a plurality of triangular figures (triangular-patch-cloud) 501a to 501f (hereinafter referred to as triangular patch 501) to the two-dimensional captured image 45. Is.
  • Each triangular patch 501 is a plane figure, but its position and orientation can be changed in three-dimensional space.
  • the orientation of the triangular patch 501 is represented by the normal vector n.
  • the position of the triangular patch 501 is represented by three-dimensional coordinates.
  • the position and orientation of each triangular patch 501 corresponds to the depth of the two-dimensional captured image 45.
  • the SLAM processing unit 120 generates three-dimensional map information by optimizing the positions of the center points and the normal vectors of the plurality of triangular patches 501 applied to the captured image 45.
  • the information processing device 1 of this modification By generating map information as a set of triangular patches 501 in this way, the information processing device 1 of this modification reduces the amount of calculation as compared with the case where the three-dimensional coordinates of points in the three-dimensional space are individually calculated. At the same time, it is possible to generate map information that closely expresses the surrounding environment.
  • the triangular patch 501 is applied to the captured image 45, but the triangular patch 501 may be applied to the BIM information.
  • the conversion unit 102, 1102, 2102 (hereinafter, referred to as the conversion unit 102) may apply the triangular patch 501 to the three-dimensional design information such as BIM information.
  • the conversion unit 102 can determine the boundary of the triangular patch 501 as the boundary of the three-dimensional structure based on the BIM information, in addition to the boundary drawn as an edge on the captured image.
  • the SLAM processing unit 120 When adopting this configuration, the SLAM processing unit 120 generates more accurate map information by correcting the positions and orientations of the plurality of triangular patches 501 applied by the conversion unit 102 based on the SLAM result. be able to.
  • the figures constituting the map information are not limited to the triangular patch 501, and the SLAM processing unit 120 may generate the map information by mesh representation or three-dimensional polygons.
  • the environmental information is the three-dimensional design information, the entry / exit information of the person in the building 9, or the image recognition result of the person in the captured image, but the environmental information is limited to these. It's not something.
  • the environmental information includes information on at least one of the ambient lighting and the weather.
  • the information regarding the ambient lighting is, for example, information indicating whether the lighting for each room or floor of the building 9 is on or off.
  • the information on the weather is information on sunshine conditions such as sunny, cloudy, and rainy in the area including the building 9.
  • the environmental information may include both information on ambient lighting and information on weather, or may include only one or the other.
  • the acquisition units 101 and 1101 acquire information on ambient lighting or weather from the external device 2.
  • the conversion unit 102 generates mask information representing a region where the captured image is likely to be deteriorated, based on the information on the ambient lighting or the weather acquired by the acquisition unit 101.
  • the mask information of the second embodiment may be distinguished as the first mask information, and the mask information of the present modification may be distinguished as the second mask information.
  • the SLAM processing unit 120 of this modification does not use the captured image at least for either self-position estimation or map information generation in the region corresponding to the mask information.
  • the SLAM processing unit 120 may not use the captured image for both the self-position estimation process and the map information generation process in the region corresponding to the mask information, or may not use it for only one of the processes. It may be a thing.
  • the SLAM processing unit 120 may use the captured image for self-position estimation processing for movement in a region corresponding to mask information, or may not use it for generating map information.
  • overexposed areas or underexposed areas may occur in the captured image depending on the lighting or sunshine conditions.
  • the use of such areas may reduce the accuracy of self-position estimation or map information.
  • the deterioration of the accuracy of the self-position estimation or the map information is reduced by not using the captured image for the self-position estimation or the generation of the map information in the region where such an event may occur. ..
  • the SLAM processing unit 120 may not use the captured image at all for estimating its own position or generating map information in the region corresponding to the mask information, but may use it with a lower priority.
  • the information processing device 1 includes a sensor or the like for detecting the surrounding state in addition to the image pickup device 17, the SLAM processing unit 120 displays the detection result by the sensor or the like for the area corresponding to the mask information. It is used to estimate the self-position or generate map information in preference to the captured image.
  • the conversion unit 102 may change the gradation of the captured image based on the environmental information. For example, the conversion unit 102 reduces overexposure or underexposure by changing the dynamic range of the captured image based on information about ambient lighting or weather.
  • the SLAM processing unit 120 executes self-position estimation and map information generation based on the captured image whose gradation has been changed by the conversion unit 102.
  • the information processing device 1 of this modification it is possible to robustly estimate the self-position and generate map information in response to the surrounding environment such as lighting conditions or sunshine conditions.
  • the environmental information may include three-dimensional design information and process information representing the construction process of the building 9.
  • the process information in this modification is information representing the construction schedule or timeline (schedule) of the building 9.
  • the building 9 is in the process of being built, by collating the three-dimensional design information such as BIM information with the process information, it becomes possible to distinguish between the area where the building 9 has been constructed and the area where the building 9 is in the process of being constructed.
  • the 3D design information basically represents a 3D model of the building 9 in the completed state of construction, there may be a difference between the 3D design information and the actual state of the building 9 in the area in the middle of construction. Highly sexual.
  • the conversion unit 102 of this modification generates unfinished area information representing an area of the building 9 in which the construction has not been completed, based on the three-dimensional design information and the process information.
  • the SLAM processing unit 120 of this modified example executes self-position estimation and map information generation for the area corresponding to the unfinished area information without using the three-dimensional design information. For example, the SLAM processing unit 120 executes self-position estimation and map information generation based on the detection result of the captured image or the sensor for the region corresponding to the unfinished region information.
  • the information processing device 1 of this modified example does not use the three-dimensional design information in the region where there is a high possibility that there is a difference between the three-dimensional design information and the actual state of the building 9, so that the building 9 can be used. Even in the middle of construction, it is possible to reduce the decrease in the accuracy of self-position estimation and map information.
  • the SLAM processing unit 120 may use the three-dimensional design information at a lower priority in the area corresponding to the unfinished area information, instead of not using the three-dimensional design information at all.
  • the map information may be generated only in the area corresponding to the unfinished area information. That is, the SLAM processing unit 120 estimates that the structure of the building 9 does not change in the area where the construction is completed, and the map information is only in the area where the structure of the building 9 changes, that is, the area corresponding to the unfinished area information. The amount of calculation is reduced by generating.
  • the tracking unit 103 of the SLAM processing unit 120 uses the captured image obtained by capturing the area other than the area corresponding to the unfinished area information in the tracking process. May be done. This is because, for the area corresponding to the unfinished area information, the structure that is the subject changes depending on the construction work, so it may be difficult to track the point 50 between the captured images captured at different times. Because there is.
  • the information processing device 1 executes the self-position estimation process and the map information generation process, but the external device 2 estimates the self-position.
  • a configuration that executes processing and generation processing of map information may be adopted.
  • the external device 2 may execute the position estimation process of the information processing device 1 and the map information generation process based on the detection result acquired from the information processing device 1 and the environmental information.
  • the external device 2 may be used as an example of the information processing device.
  • the expression "at least one of a, b and c (one)” or “at least one of a, b or c (one)” (including similar expressions). ) Is used, it includes any of a, b, c, ab, ac, bc, or abc. It may also include multiple instances of any element, such as a-a, a-b-b, a-a-b-b-c-c, and the like. It also includes adding elements other than the listed elements (a, b and c), such as having d, such as a-b-c-d.
  • connection when the terms "connected” and “coupled” are used, direct connection / coupling and indirect connection / coupling are used. , Electrically connected / combined, communicatively connected / combined, operatively connected / combined, physically connected / combined, etc. Intended as a term.
  • the term should be interpreted as appropriate according to the context in which the term is used, but any connection / combination form that is not intentionally or naturally excluded is not included in the term. It should be interpreted in a limited way.
  • the physical structure of the element A can execute the operation B. Including that the element A has a configuration and the permanent or temporary setting (setting / configuration) of the element A is set (configured / set) to actually execute the operation B. good.
  • the element A is a general-purpose processor
  • the processor has a hardware configuration capable of executing the operation B, and the operation B is set by setting a permanent or temporary program (instruction). It suffices if it is configured to actually execute.
  • the element A is a dedicated processor, a dedicated arithmetic circuit, or the like, the circuit structure of the processor actually executes the operation B regardless of whether or not the control instruction and data are actually attached. It only needs to be implemented.
  • finding a global optimal value finding an approximation of a global optimal value, finding a local optimal value, and local optimization It should be interpreted as appropriate according to the context in which the term was used, including finding an approximation of the value. It also includes probabilistically or heuristically finding approximate values of these optimal values.
  • the respective hardware when a plurality of hardware performs a predetermined process, the respective hardware may cooperate to perform the predetermined process, or some hardware may perform the predetermined process. You may do all of the above. Further, some hardware may perform a part of a predetermined process, and another hardware may perform the rest of the predetermined process.
  • the hardware that performs the first process and the hardware that performs the second process when expressions such as "one or more hardware performs the first process and the one or more hardware performs the second process" are used. , The hardware that performs the first process and the hardware that performs the second process may be the same or different. That is, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more hardware.
  • the hardware may include an electronic circuit, a device including the electronic circuit, or the like.
  • each storage device (memory) among the plurality of storage devices (memory) stores only a part of the data. It may be stored or the entire data may be stored.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Automation & Control Theory (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The information processing device according to an embodiment is provided with at least one memory and at least one processor. The at least one processor is configured to be capable of executing: acquiring a detection result and an environmental information, the detection result including at least one of the state of the surroundings of the information processing device or the state of the information processing device, the environmental information relating to the environment of the surroundings of the information processing device; and executing estimation of the self-location and generation of map information on the basis of the environmental information and the detection result.

Description

情報処理装置、情報処理方法、およびプログラムInformation processing equipment, information processing methods, and programs

 本発明の実施形態は、情報処理装置、情報処理方法、およびプログラムに関する。 An embodiment of the present invention relates to an information processing device, an information processing method, and a program.

 従来、センサによるセンシング結果や撮像画像から周囲の物体の位置および形状を認識することによって、自己位置を推定すると共に地図情報を生成するロボット等が知られている。 Conventionally, robots and the like that estimate their own position and generate map information by recognizing the position and shape of surrounding objects from the sensing result by the sensor or the captured image are known.

特開2003-015739号公報Japanese Unexamined Patent Publication No. 2003-015739

 しかしながら、従来技術においては、高精度に自己位置の推定および地図情報の生成をすることが困難な場合があった。 However, in the prior art, it may be difficult to estimate the self-position and generate map information with high accuracy.

 実施形態の情報処理装置は、少なくとも1つのメモリと、少なくとも1つのプロセッサと、を備える。少なくとも1つのプロセッサは、情報処理装置の周囲の状態または情報処理装置の状態のいずれか1つを含む検知結果と、情報処理装置の周囲の環境に関する環境情報と、を取得することと、環境情報と検知結果とに基づいて、自己位置の推定と地図情報の生成とを実行することと、を実行可能に構成される。 The information processing device of the embodiment includes at least one memory and at least one processor. At least one processor acquires a detection result including either the surrounding state of the information processing device or the state of the information processing device, and environmental information about the environment around the information processing device, and the environmental information. Based on the information processing and the detection result, it is possible to estimate the self-position and generate map information.

図1は、第1の実施形態に係る情報処理装置のハードウェア構成の一例を示すブロック図である。FIG. 1 is a block diagram showing an example of the hardware configuration of the information processing apparatus according to the first embodiment. 図2は、第1の実施形態に係る情報処理装置が備える機能の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of a function provided in the information processing apparatus according to the first embodiment. 図3は、第1の実施形態に係るトラッキング処理の一例を示すイメージ図である。FIG. 3 is an image diagram showing an example of tracking processing according to the first embodiment. 図4は、第1の実施形態に係る情報処理装置と周囲の物体との位置関係の一例を示すイメージ図である。FIG. 4 is an image diagram showing an example of the positional relationship between the information processing apparatus according to the first embodiment and surrounding objects. 図5は、第1の実施形態に係るバンドル調整の一例を示すイメージ図である。FIG. 5 is an image diagram showing an example of bundle adjustment according to the first embodiment. 図6は、第1の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the first embodiment. 図7は、第2の実施形態に係る情報処理装置が備える機能の一例を示すブロック図である。FIG. 7 is a block diagram showing an example of the functions provided in the information processing apparatus according to the second embodiment. 図8は、第2の実施形態に係る情報処理装置と周囲の物体との位置関係の一例を示すイメージ図である。FIG. 8 is an image diagram showing an example of the positional relationship between the information processing apparatus according to the second embodiment and surrounding objects. 図9は、第2の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。FIG. 9 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the second embodiment. 図10は、第3の実施形態に係る情報処理装置が備える機能の一例を示すブロック図である。FIG. 10 is a block diagram showing an example of the functions provided in the information processing apparatus according to the third embodiment. 図11は、第3の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。FIG. 11 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the third embodiment. 図12は、第4の実施形態に係る撮像画像のセグメンテーションの一例を示す図である。FIG. 12 is a diagram showing an example of segmentation of a captured image according to a fourth embodiment. 図13は、変形例2に係る地図情報の一例を示す図である。FIG. 13 is a diagram showing an example of map information according to the second modification.

(第1の実施形態)
 図1は、第1の実施形態に係る情報処理装置1のハードウェア構成の一例を示すブロック図である。情報処理装置1は、一例として、本体部10と、移動装置16と、撮像装置17と、IMU(Inertial Measurement Unit)センサ18とを備える。
(First Embodiment)
FIG. 1 is a block diagram showing an example of the hardware configuration of the information processing apparatus 1 according to the first embodiment. As an example, the information processing device 1 includes a main body 10, a moving device 16, an imaging device 17, and an IMU (Inertial Measurement Unit) sensor 18.

 移動装置16は、情報処理装置1を移動させることが可能な装置である。移動装置16は、一例として、複数の車輪と、これらの車輪を駆動させるモータとを有し、本体部10を支持するように本体部10の下部に連結される。 The moving device 16 is a device capable of moving the information processing device 1. As an example, the moving device 16 has a plurality of wheels and a motor for driving these wheels, and is connected to the lower part of the main body 10 so as to support the main body 10.

 情報処理装置1は、移動装置16によって、例えば建設中の建物、建設されたビル、駅のホーム、または工場などの中を移動可能であるものとする。本実施形態においては、情報処理装置1が、建設中の建物の中を移動する場合を例として説明する。 It is assumed that the information processing device 1 can be moved by the mobile device 16, for example, in a building under construction, a built building, a platform of a station, a factory, or the like. In the present embodiment, the case where the information processing device 1 moves in the building under construction will be described as an example.

 なお、情報処理装置1の移動手段は車輪に限定されるものではなく、キャタピラや、プロペラ等であってもよい。情報処理装置1は、例えば、ロボットや、ドローン等である。なお、本実施形態においては、情報処理装置1は、自律移動をするものとするが、これに限定されるものではない。 The means of transportation of the information processing device 1 is not limited to wheels, and may be caterpillars, propellers, or the like. The information processing device 1 is, for example, a robot, a drone, or the like. In the present embodiment, the information processing device 1 is supposed to move autonomously, but the information processing device 1 is not limited to this.

 撮像装置17は、例えば、左右に並んだ2台のカメラが1セットになったステレオカメラである。撮像装置17は、2台のカメラでそれぞれ撮像した撮像画像データを、対応付けて本体部10に送出する。 The image pickup device 17 is, for example, a stereo camera in which two cameras arranged side by side are set as one set. The image pickup device 17 transmits the captured image data captured by the two cameras to the main body 10 in association with each other.

 IMUセンサ18は、ジャイロセンサおよび加速度センサ等が統合されたセンサであり、情報処理装置1の角速度と加速度とを計測する。IMUセンサ18は、計測した角速度と加速度とを本体部10に送出する。なお、IMUセンサ18は、ジャイロセンサと加速度センサだけではなく、磁気センサやGPS(Global Positioning System)装置等をさらに包含してもよい。 The IMU sensor 18 is a sensor in which a gyro sensor, an acceleration sensor, and the like are integrated, and measures the angular velocity and acceleration of the information processing device 1. The IMU sensor 18 sends the measured angular velocity and acceleration to the main body 10. The IMU sensor 18 may further include not only a gyro sensor and an acceleration sensor, but also a magnetic sensor, a GPS (Global Positioning System) device, and the like.

 本実施形態においては、撮像装置17とIMUセンサ18とを総称して、検知部ともいう。なお、検知部はさらに各種のセンサを含むものとしてもよい。一例として、情報処理装置1は、超音波センサやレーザスキャナ等の測距センサをさらに備えてもよい。 In the present embodiment, the image pickup device 17 and the IMU sensor 18 are collectively referred to as a detection unit. The detection unit may further include various sensors. As an example, the information processing device 1 may further include a distance measuring sensor such as an ultrasonic sensor or a laser scanner.

 なお、本実施形態において「検知」という場合は、情報処理装置1の周囲を撮像することや、情報処理装置1の角速度または加速度を計測すること、情報処理装置1の周囲の物体との距離を測距すること等を含むものとする。また、本実施形態においては、検知部による検知結果は、少なくとも、情報処理装置1の周囲の状態、または情報処理装置1の状態のいずれか1つを含む。換言すれば、検知結果は、情報処理装置1の周囲の状態と情報処理装置1の状態に関する情報の両方を含むものでもよいし、情報処理装置1の周囲の状態と情報処理装置1の状態に関する情報のいずれか一方のみを含むものでもよい。 In the present embodiment, the term "detection" refers to imaging the surroundings of the information processing device 1, measuring the angular velocity or acceleration of the information processing device 1, and the distance to an object around the information processing device 1. It shall include measuring the distance. Further, in the present embodiment, the detection result by the detection unit includes at least one of the surrounding state of the information processing device 1 and the state of the information processing device 1. In other words, the detection result may include both information about the surrounding state of the information processing device 1 and the state of the information processing device 1, or may relate to the surrounding state of the information processing device 1 and the state of the information processing device 1. It may contain only one of the information.

 情報処理装置1の周囲の状態は、例えば、情報処理装置1の周囲を撮像した撮像画像、および情報処理装置1の周囲の物体と情報処理装置1の距離の測距結果等である。また、情報処理装置1の状態は、例えば、IMUセンサ18によって計測された角速度および加速度である。例えば、撮像装置17によって撮像された撮像画像は、情報処理装置1の周囲の状態の検知結果の一例である。本実施形態においては、検知結果は少なくとも撮像画像を含むものとするが、さらに他の情報を含むものとしてもよい。 The surrounding state of the information processing device 1 is, for example, an captured image of the surroundings of the information processing device 1, a distance measurement result of a distance between an object around the information processing device 1 and the information processing device 1. The state of the information processing device 1 is, for example, the angular velocity and acceleration measured by the IMU sensor 18. For example, the captured image captured by the imaging device 17 is an example of the detection result of the surrounding state of the information processing device 1. In the present embodiment, the detection result includes at least the captured image, but may further include other information.

 本体部10は、一例として、プロセッサ11と、主記憶装置12(メモリ)と、補助記憶装置14(メモリ)と、ネットワークインタフェース13と、デバイスインタフェース15と、を備え、これらがバス19を介して接続されたコンピュータとして実現されてもよい。なお、撮像装置17およびIMUセンサ18が本体部10に内蔵される構成を採用してもよい。 As an example, the main body 10 includes a processor 11, a main storage device 12 (memory), an auxiliary storage device 14 (memory), a network interface 13, and a device interface 15, which are routed via a bus 19. It may be realized as a connected computer. The image pickup device 17 and the IMU sensor 18 may be incorporated in the main body 10.

 プロセッサ11は、コンピュータの制御装置及び演算装置を含む電子回路(処理回路、Processing circuit、Processing circuitry、CPU(Central Processing Unit)、GPU(Graphics Processing Unit)、FPGA(Field Programmable Gate Array)、又はASIC(Application Specific Integrated Circuit)等)であってもよい。また、プロセッサ11は、専用の処理回路を含む半導体装置等であってもよい。プロセッサ11は、電子論理素子を用いた電子回路に限定されるものではなく、光論理素子を用いた光回路により実現されてもよい。また、プロセッサ11は、量子コンピューティングに基づく演算機能を含むものであってもよい。 The processor 11 is an electronic circuit (processing circuit, Processing circuit, Processing circuitry, CPU (Central Processing Unit), GPU (Graphics Processing Unit), FPGA (Field Programmable Gate Array), or ASIC (Processing circuit, Processing circuit, Processing circuitry, CPU (Central Processing Unit)) including a computer control device and a computing device. Application Specific Integrated Circuit), etc.) may be used. Further, the processor 11 may be a semiconductor device or the like including a dedicated processing circuit. The processor 11 is not limited to an electronic circuit using an electronic logic element, and may be realized by an optical circuit using an optical logic element. Further, the processor 11 may include a calculation function based on quantum computing.

 プロセッサ11は、情報処理装置1の内部構成の各装置等から入力されたデータやソフトウェア(プログラム)に基づいて演算処理を行い、演算結果や制御信号を各装置等に出力することができる。プロセッサ11は、情報処理装置1のOS(Operating System)や、アプリケーション等を実行することにより、情報処理装置1を構成する各構成要素を制御してもよい。 The processor 11 can perform arithmetic processing based on the data and software (program) input from each apparatus and the like of the internal configuration of the information processing apparatus 1, and output the arithmetic result and the control signal to each apparatus and the like. The processor 11 may control each component constituting the information processing device 1 by executing an OS (Operating System) of the information processing device 1, an application, or the like.

 主記憶装置12は、プロセッサ11が実行する命令及び各種データ等を記憶する記憶装置であり、主記憶装置12に記憶された情報がプロセッサ11により読み出される。補助記憶装置14は、主記憶装置12以外の記憶装置である。なお、これらの記憶装置は、電子情報を格納可能な任意の電子部品を意味するものとし、半導体のメモリでもよい。半導体のメモリは、揮発性メモリ、不揮発性メモリのいずれでもよい。本実施形態における情報処理装置1において各種データを保存するための記憶装置は、主記憶装置12又は補助記憶装置14により実現されてもよく、プロセッサ11に内蔵される内蔵メモリにより実現されてもよい。なお、主記憶装置12または補助記憶装置14を、記憶部ともいう。 The main storage device 12 is a storage device that stores instructions executed by the processor 11, various data, and the like, and the information stored in the main storage device 12 is read out by the processor 11. The auxiliary storage device 14 is a storage device other than the main storage device 12. Note that these storage devices mean arbitrary electronic components capable of storing electronic information, and may be semiconductor memories. The semiconductor memory may be either a volatile memory or a non-volatile memory. The storage device for storing various data in the information processing device 1 in the present embodiment may be realized by the main storage device 12 or the auxiliary storage device 14, or may be realized by the built-in memory built in the processor 11. .. The main storage device 12 or the auxiliary storage device 14 is also referred to as a storage unit.

 記憶装置(メモリ)1つに対して、複数のプロセッサが接続(結合)されてもよいし、単数のプロセッサが接続されてもよい。プロセッサ1つに対して、複数の記憶装置(メモリ)が接続(結合)されてもよい。本実施形態における情報処理装置1が、少なくとも1つの記憶装置(メモリ)とこの少なくとも1つの記憶装置(メモリ)に接続(結合)される複数のプロセッサで構成される場合、複数のプロセッサのうち少なくとも1つのプロセッサが、少なくとも1つの記憶装置(メモリ)に接続(結合)される構成を含んでもよい。また、複数台のコンピュータに含まれる記憶装置(メモリ)とプロセッサによって、この構成が実現されてもよい。さらに、記憶装置(メモリ)がプロセッサと一体になっている構成(例えば、L1キャッシュ、L2キャッシュを含むキャッシュメモリ)を含んでもよい。 Multiple processors may be connected (combined) to one storage device (memory), or a single processor may be connected. A plurality of storage devices (memory) may be connected (combined) to one processor. When the information processing device 1 in the present embodiment is composed of at least one storage device (memory) and a plurality of processors connected (combined) to the at least one storage device (memory), at least one of the plurality of processors One processor may include a configuration in which it is connected (combined) to at least one storage device (memory). Further, this configuration may be realized by a storage device (memory) and a processor included in a plurality of computers. Further, a configuration in which the storage device (memory) is integrated with the processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.

 ネットワークインタフェース13は、無線又は有線により、通信ネットワーク3に接続するためのインタフェースである。ネットワークインタフェース13は、既存の通信規格に適合したもの等、適切なインタフェースを用いればよい。ネットワークインタフェース13により、通信ネットワーク3を介して接続された外部装置2と情報のやり取りが行われてもよい。なお、通信ネットワーク3は、WAN(Wide Area Network)、LAN(Local Area Network)、PAN(Personal Area Network)等の何れか、又は、それらの組み合わせであってよく、情報処理装置1と外部装置2との間で情報のやり取りが行われるものであればよい。WANの一例としてインターネット等があり、LANの一例としてIEEE802.11やイーサネット(登録商標)等があり、PANの一例としてBluetooth(登録商標)やNFC(Near Field Communication)等がある。 The network interface 13 is an interface for connecting to the communication network 3 wirelessly or by wire. As the network interface 13, an appropriate interface such as one conforming to an existing communication standard may be used. Information may be exchanged with the external device 2 connected via the communication network 3 by the network interface 13. The communication network 3 may be any one of WAN (Wide Area Network), LAN (Local Area Network), PAN (Personal Area Network), or a combination thereof, and the information processing device 1 and the external device 2 may be used. It suffices as long as information is exchanged with. An example of WAN is the Internet, an example of LAN is IEEE802.11, Ethernet (registered trademark), etc., and an example of PAN is Bluetooth (registered trademark), NFC (Near Field Communication), etc.

 デバイスインタフェース15は、移動装置16、撮像装置17、およびIMUセンサ18と直接接続するインタフェースである。デバイスインタフェース15は、例えばUSB(Universal Serial Bus)等の標準規格に準拠するインタフェースとするが、これに限定されるものではない。また、デバイスインタフェース15は、図1に示した各種装置以外の外部装置とさらに接続するものとしてもよい。 The device interface 15 is an interface that directly connects to the mobile device 16, the image pickup device 17, and the IMU sensor 18. The device interface 15 is an interface that conforms to a standard such as USB (Universal Serial Bus), but is not limited thereto. Further, the device interface 15 may be further connected to an external device other than the various devices shown in FIG.

 外部装置2は、例えばサーバ装置等である。外部装置2は、情報処理装置1と通信ネットワーク3を介して接続されている。 The external device 2 is, for example, a server device or the like. The external device 2 is connected to the information processing device 1 via a communication network 3.

 本実施形態の外部装置2は、建物の3次元設計情報を予め記憶している。建物の3次元設計情報は、例えば、BIM(Building Information Modeling)情報である。BIM情報には、建物の3次元構造の情報と、建材の素材等の情報とが含まれる。なお、建物の3次元設計情報はBIM情報に限定されるものではなく、3D CAD(Computer-Aided Design)データ等であってもよい。 The external device 2 of this embodiment stores the three-dimensional design information of the building in advance. The three-dimensional design information of the building is, for example, BIM (Building Information Modeling) information. BIM information includes information on the three-dimensional structure of a building and information on materials such as building materials. The three-dimensional design information of the building is not limited to BIM information, and may be 3D CAD (Computer-Aided Design) data or the like.

 3次元設計情報は、本実施形態における環境情報の一例である。環境情報は、情報処理装置1の周囲の環境に関する情報である。例えば、環境情報は、少なくとも、情報処理装置1が走行する建物に関する情報、情報処理装置1の周囲に存在する人物または物体に関する情報、天気に関する情報、照明に関する情報のいずれか1つを含む。上述の3次元設計情報は、より詳細には、環境情報のうち、情報処理装置1が走行する建物に関する情報の一例である。なお、環境情報は、複数の種類の情報の組み合わせでも良いし、いずれか1種類の情報のみを含むものでも良い。例えば、本実施形態においては、環境情報は、少なくとも3次元設計情報を含むものとするが、さらに、情報処理装置1の周囲の環境に関する他の情報を含むものであってもよい。 The three-dimensional design information is an example of environmental information in this embodiment. The environmental information is information about the environment around the information processing device 1. For example, the environmental information includes at least one of information about a building in which the information processing device 1 travels, information about a person or an object existing around the information processing device 1, information about the weather, and information about lighting. More specifically, the above-mentioned three-dimensional design information is an example of information on the building in which the information processing device 1 travels among the environmental information. The environmental information may be a combination of a plurality of types of information, or may include only one type of information. For example, in the present embodiment, the environmental information includes at least three-dimensional design information, but may further include other information regarding the environment around the information processing device 1.

 なお、本実施形態において「物体」という場合は、壁や柱等の構造物、什器、家具、移動体、仮設物、および人物等を含むものとする。 Note that the term "object" in this embodiment includes structures such as walls and pillars, furniture, furniture, moving objects, temporary objects, people, and the like.

 なお、本実施形態においては、情報処理装置1と外部装置2とは無線接続するものとするが、情報処理装置1と外部装置2とが有線接続してもよい。また、情報処理装置1は、外部装置2と常時接続していなくともよい。 In the present embodiment, the information processing device 1 and the external device 2 are wirelessly connected, but the information processing device 1 and the external device 2 may be connected by wire. Further, the information processing device 1 does not have to be always connected to the external device 2.

 次に、情報処理装置1が有する機能について説明する。図2は、第1の実施形態に係る情報処理装置1が備える機能の一例を示すブロック図である。 Next, the functions of the information processing device 1 will be described. FIG. 2 is a block diagram showing an example of the functions included in the information processing apparatus 1 according to the first embodiment.

 図2に示すように、情報処理装置1は、取得部101と、変換部102と、SLAM(Simultaneous Localization and Mapping、またはSimultaneously Localization and Mapping)処理部120と、移動制御部105とを備える。また、SLAM処理部120は、トラッキング部103と、バンドル調整(Bundle Adjustment)部104とを含む。 As shown in FIG. 2, the information processing device 1 includes an acquisition unit 101, a conversion unit 102, a SLAM (Simultaneous Localization and Mapping, or Simultaneously Localization and Mapping) processing unit 120, and a movement control unit 105. Further, the SLAM processing unit 120 includes a tracking unit 103 and a bundle adjustment unit 104.

 取得部101は、情報処理装置1の周囲の状態または情報処理装置1の状態の検知結果と、情報処理装置1の周囲の環境に関する環境情報と、を取得する。 The acquisition unit 101 acquires the detection result of the surrounding state of the information processing device 1 or the state of the information processing device 1 and the environmental information regarding the environment around the information processing device 1.

 より詳細には、取得部101は、例えば、ネットワークインタフェース13を介して、外部装置2から、BIM情報を取得する。取得部101は、取得したBIM情報を補助記憶装置14に保存する。 More specifically, the acquisition unit 101 acquires BIM information from the external device 2 via, for example, the network interface 13. The acquisition unit 101 stores the acquired BIM information in the auxiliary storage device 14.

 また、取得部101は、デバイスインタフェース15を介して、撮像装置17から撮像画像を取得する。また、取得部101は、デバイスインタフェース15を介して、IMUセンサ18から、角速度および加速度を取得する。 Further, the acquisition unit 101 acquires an captured image from the imaging device 17 via the device interface 15. In addition, the acquisition unit 101 acquires the angular velocity and acceleration from the IMU sensor 18 via the device interface 15.

 変換部102は、環境情報を、少なくとも後述のSLAM処理部120による自己位置の推定処理、または地図情報の生成処理のいずれかの入力値に変換する。変換部102は、環境情報を、自己位置の推定処理と地図情報の生成処理の両方の入力値に変換しても良いし、いずれか一方の処理の入力値にのみ変換しても良い。なお、本実施形態において、「変換」という場合は、環境情報から他の情報を生成すること、または環境情報から情報を抽出、取得又は検索することを含む。 The conversion unit 102 converts the environmental information into an input value of at least one of the self-position estimation process by the SLAM processing unit 120 described later or the map information generation process. The conversion unit 102 may convert the environmental information into the input values of both the self-position estimation process and the map information generation process, or may convert only the input values of either process. In the present embodiment, the term "conversion" includes generating other information from the environmental information or extracting, acquiring or searching the information from the environmental information.

 本実施形態においては、変換部102は、BIM情報から、後述のバンドル調整部104によるバンドル調整処理で使用される、3次元空間上の点の3次元座標(世界座標)の初期値を生成する。 In the present embodiment, the conversion unit 102 generates the initial value of the three-dimensional coordinates (world coordinates) of the points in the three-dimensional space used in the bundle adjustment process by the bundle adjustment unit 104 described later from the BIM information. ..

 例えば、変換部102は、後述のトラッキング部103によって特定された撮像装置17の現在の位置および姿勢に基づいて、BIM情報に含まれる壁や柱等の構造物のうち、撮像装置17の撮像範囲に含まれる構造物を特定する。そして、変換部102は、撮像装置17から撮像装置17の撮像範囲に含まれる構造物の3次元座標(世界座標)を、BIM情報から特定する。一例として、変換部102は、BIM情報によって表される建物のうち、いずれか1点の世界座標を、外部装置等から取得し、該1点を基準として、BIM情報によって表される建物に含まれる各点のBIM情報における3次元座標を、世界座標に変換する。なお、BIM情報から、建物に含まれる各点の世界座標を求める手法はこれに限定されるものではない。変換部102は、特定した3次元座標(世界座標)を、後述のバンドル調整処理における3次元空間上の点の3次元座標(世界座標)の初期値として、バンドル調整部104に送出する。バンドル調整処理の詳細については、後述する。変換部102がBIM情報から特定した建物9に含まれる点の3次元座標(世界座標)は、本実施形態における周囲の物体の位置に関する情報の一例である。以下、特に限定しない限り、本実施形態における3次元座標は世界座標とする。 For example, the conversion unit 102 has an imaging range of the imaging device 17 among the structures such as walls and columns included in the BIM information based on the current position and orientation of the imaging device 17 specified by the tracking unit 103 described later. Identify the structures contained in. Then, the conversion unit 102 specifies the three-dimensional coordinates (world coordinates) of the structure included in the imaging range of the imaging device 17 from the imaging device 17 from the BIM information. As an example, the conversion unit 102 acquires the world coordinates of any one of the buildings represented by the BIM information from an external device or the like, and includes the building represented by the BIM information with the one point as a reference. The three-dimensional coordinates in the BIM information of each point are converted into world coordinates. The method of obtaining the world coordinates of each point included in the building from the BIM information is not limited to this. The conversion unit 102 sends the specified three-dimensional coordinates (world coordinates) to the bundle adjustment unit 104 as the initial value of the three-dimensional coordinates (world coordinates) of the points in the three-dimensional space in the bundle adjustment process described later. The details of the bundle adjustment process will be described later. The three-dimensional coordinates (world coordinates) of the points included in the building 9 specified by the conversion unit 102 from the BIM information are examples of information regarding the positions of surrounding objects in the present embodiment. Hereinafter, unless otherwise specified, the three-dimensional coordinates in the present embodiment are world coordinates.

 なお、変換部102は、初期値を一意の値として特定せずに、初期値の範囲を特定してもよい。例えば、変換部102は、変換部102は、3次元空間上の点の3次元座標を、一意の座標として特定するのではなく、範囲を持たせても良い。この場合、撮像装置17の撮像範囲に含まれる構造物に含まれるある1点が含まれる可能性が高い3次元空間領域を、初期値の範囲としてバンドル調整部104に送出する。初期値または初期値の範囲は、本実施形態において変換部102が生成する入力値の一例である。 Note that the conversion unit 102 may specify the range of the initial value without specifying the initial value as a unique value. For example, the conversion unit 102 may provide a range instead of specifying the three-dimensional coordinates of a point in the three-dimensional space as unique coordinates. In this case, a three-dimensional space region that is likely to include a certain point included in the structure included in the imaging range of the imaging device 17 is sent to the bundle adjusting unit 104 as a range of initial values. The initial value or the range of the initial value is an example of the input value generated by the conversion unit 102 in the present embodiment.

 なお、本実施形態においては、BIM情報における3次元座標系と、SLAM処理部120における3次元座標系(SLAM座標系)は予めキャリブレーションされているものとする。本実施形態において、キャリブレーションとは、BIM情報における位置と、SLAM座標系における位置との対応関係が定義されていることをいう。例えば、情報処理装置1の移動開始地点のBIM情報における3次元座標系での位置が基準点として補助記憶装置14に保存されていてもよい。このため、変換部102は、トラッキング部103によって特定された撮像装置17の位置に対応する、BIM情報に表される建物内の位置を特定することができる。 In the present embodiment, it is assumed that the three-dimensional coordinate system in the BIM information and the three-dimensional coordinate system (SLAM coordinate system) in the SLAM processing unit 120 are calibrated in advance. In the present embodiment, the calibration means that the correspondence between the position in the BIM information and the position in the SLAM coordinate system is defined. For example, the position of the movement start point of the information processing device 1 in the BIM information in the three-dimensional coordinate system may be stored in the auxiliary storage device 14 as a reference point. Therefore, the conversion unit 102 can specify the position in the building represented by the BIM information corresponding to the position of the image pickup device 17 specified by the tracking unit 103.

 BIM情報における3次元座標系と、SLAM座標系とのキャリブレーションは、管理者等の入力操作によって実行されてもよいし、建物に設置されたAR(Augmented Reality)マーカ等の指標をSLAM処理部120が撮像画像から認識することにより実行されてもよい。 The calibration of the 3D coordinate system and the SLAM coordinate system in the BIM information may be executed by an input operation by an administrator or the like, or an index such as an AR (Augmented Reality) marker installed in the building may be used as an index of the SLAM processing unit. It may be executed by recognizing 120 from the captured image.

 SLAM処理部120は、自己位置推定と地図情報の生成とを同時に行う。本実施形態において、自己位置とは、情報処理装置1の位置および姿勢である。また、本実施形態においては、撮像装置17は情報処理装置1に搭載されているため、撮像装置17の位置および姿勢は、情報処理装置1の位置および姿勢と同じものを表すものとする。撮像装置17が情報処理装置1の中心から離れた位置に設置されている場合には、SLAM処理部120は、撮像装置17と情報処理装置1の中心との位置のずれを補正して、情報処理装置1の位置である自己位置を推定する。 The SLAM processing unit 120 simultaneously estimates the self-position and generates map information. In the present embodiment, the self-position is the position and orientation of the information processing device 1. Further, in the present embodiment, since the image pickup device 17 is mounted on the information processing device 1, the position and orientation of the image pickup device 17 represent the same as the position and posture of the information processing device 1. When the image pickup apparatus 17 is installed at a position away from the center of the information processing apparatus 1, the SLAM processing unit 120 corrects the displacement between the image pickup apparatus 17 and the center of the information processing apparatus 1 to provide information. The self-position, which is the position of the processing device 1, is estimated.

 また、地図情報は、情報処理装置1の移動軌跡に沿って、周囲の構造物の形状を表したものである。より詳細には、本実施形態の地図情報は、情報処理装置1が走行する建物の内部の構造を、情報処理装置1の移動軌跡に沿って3次元で表す。本実施形態の地図情報は、例えば、情報処理装置1が走行する建物の内部の構造が3次元座標をもつ点群として表される点群地図とする。地図情報の種類はこれに限定されるものではなく、点群の代わりに3次元図形の集合によって地図が表されてもよい。また、地図情報は、環境マップともいう。 Further, the map information represents the shape of the surrounding structure along the movement locus of the information processing device 1. More specifically, the map information of the present embodiment represents the internal structure of the building in which the information processing device 1 travels in three dimensions along the movement locus of the information processing device 1. The map information of the present embodiment is, for example, a point cloud map in which the internal structure of the building in which the information processing device 1 travels is represented as a point cloud having three-dimensional coordinates. The type of map information is not limited to this, and the map may be represented by a set of three-dimensional figures instead of a point cloud. The map information is also called an environment map.

 SLAM処理部120は、本実施形態における推定部の一例である。なお、自己位置の推定と地図情報の生成の手法としては、SLAM以外の手法が採用されてもよい。また、自己位置の推定と地図情報の生成とは同時に行われなくともよく、一方の処理が先に完了した後に、もう一方の処理が実行されてもよい。また、本明細書において地図情報を生成することには、少なくとも、地図情報を新規に生成すること、生成した地図情報を調整すること、又は、生成した地図情報を更新することのいずれかが含まれるものとする。 The SLAM processing unit 120 is an example of an estimation unit in this embodiment. As a method for estimating the self-position and generating map information, a method other than SLAM may be adopted. Further, the estimation of the self-position and the generation of the map information do not have to be performed at the same time, and the other process may be executed after one process is completed first. In addition, the generation of map information in the present specification includes at least one of newly generating map information, adjusting the generated map information, or updating the generated map information. It shall be.

 SLAM処理部120は、トラッキング部103と、バンドル調整部104とを含む。 The SLAM processing unit 120 includes a tracking unit 103 and a bundle adjustment unit 104.

 トラッキング部103は、撮像装置17によって異なる時刻に撮像された複数の撮像画像を追跡することによって、撮像装置17の位置および姿勢を特定する。トラッキング部103は、本実施形態における特定部の一例である。 The tracking unit 103 identifies the position and orientation of the image pickup device 17 by tracking a plurality of captured images captured by the image pickup device 17 at different times. The tracking unit 103 is an example of a specific unit in the present embodiment.

 例えば、本実施形態においては、撮像装置17は、情報処理装置1の移動に伴って、移動しながら周囲を撮像している。トラッキング部103は、ある撮像画像に描出された点を、異なる時刻に撮像された他の撮像画像上で追跡することにより、撮像装置17の位置および姿勢の変化を算出する。トラッキング部103は、撮像開始時における撮像装置17の位置および姿勢に、トラッキング処理によって特定した位置および姿勢の変化を追加することにより、撮像装置17の現在の位置および姿勢を特定する。 For example, in the present embodiment, the image pickup device 17 captures the surroundings while moving as the information processing device 1 moves. The tracking unit 103 calculates changes in the position and posture of the image pickup device 17 by tracking points drawn on a certain captured image on another captured image captured at different times. The tracking unit 103 specifies the current position and orientation of the imaging device 17 by adding changes in the position and orientation specified by the tracking process to the position and orientation of the imaging device 17 at the start of imaging.

 図3は、第1の実施形態に係るトラッキング処理の一例を示すイメージ図である。参照フレーム41と、対象フレーム42とは、撮像装置17によって異なる時刻に撮像された撮像画像である。参照フレーム41は、対象フレーム42よりも前に撮像された撮像画像であり、撮像装置17は、参照フレーム41を撮像した時点における位置Tから、対象フレーム42を撮像した時点における位置Tに移動したものとする。参照フレーム41をキーフレーム、対象フレーム42を現在のフレームともいう。 FIG. 3 is an image diagram showing an example of tracking processing according to the first embodiment. The reference frame 41 and the target frame 42 are captured images captured at different times by the imaging device 17. The reference frame 41 is an captured image captured before the target frame 42, and the imaging device 17 changes from the position Ti at the time when the reference frame 41 is imaged to the position T j at the time when the target frame 42 is imaged. It is assumed that it has moved. The reference frame 41 is also referred to as a key frame, and the target frame 42 is also referred to as a current frame.

 この場合、トラッキング部103は、参照フレーム41に描出された点Pが、対象フレーム42に描出された場合における測光誤差を算出することにより、位置Tから位置Tへ撮像装置17が移動した場合における相対的な移動量を算出する。点Pは、例えば、参照フレーム41上の特徴点である。撮像装置17の移動とは、撮像装置17の位置の変化と、姿勢(向き)の変化との両方を含むものとする。 In this case, the tracking unit 103, P point which is depicted in the reference frame 41, by calculating the photometry error in the case where it is depicted in the target frame 42, the image pickup device 17 is moved from the position T i to the position T j Calculate the relative amount of movement in the case. The point P is, for example, a feature point on the reference frame 41. The movement of the image pickup apparatus 17 includes both a change in the position of the image pickup apparatus 17 and a change in the posture (orientation).

 また、参照フレーム41を撮像した時点における位置Tは、既に誤差補正済みであるものとする。図3に示す点50aは、参照フレーム41に描出された点Pが3次元空間上に逆投影された位置を表す。 The position T i at the time of the reference frame 41 and the image pickup is assumed to be already error-corrected. The point 50a shown in FIG. 3 represents the position where the point P drawn on the reference frame 41 is back-projected on the three-dimensional space.

 トラッキング部103は、一例として、以下の(1)式を用いて、参照フレーム41と対象フレーム42との間の測光誤差Epjを算出する。 As an example, the tracking unit 103 calculates the photometric error Epj between the reference frame 41 and the target frame 42 by using the following equation (1).

Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001

 Iは参照フレーム41を表し、Iは対象フレーム42を表す。また、Nは、参照フレーム41上の点Pを含むピクセルの近傍パターンである。また、tは参照フレーム41の露光時間、tは対象フレーム42の露光時間を表す。また、p´は、逆深度dによる、対象フレーム42におけるPの投影点である。また、(1)式に示すように、トラッキング部103は、Huberノルム(norm)を用いて、測光誤差Epjを算出している。また、重み係数Wは、画素の輝度勾配によって事前に算出される。例えば、勾配が大きいピクセルに関しては重み係数Wの値を小さくすることにより、ノイズを低減させることができる。重み係数Wの算出の手法は、公知の手法を適用することができる。また、輝度変換用ハイパーパラメータa,a,b,bは、参照フレーム41と対象フレーム42との輝度を変換するパラメータである。輝度変換用ハイパーパラメータa,a,b,bは、例えば管理者によって手動でチューニングされても良い。 I i represents the reference frame 41 and I j represents the target frame 42. Further, N p is a neighborhood pattern of pixels including the point P on the reference frame 41. Also, t i is the exposure time of the reference frame 41, t j represents the exposure time of the target frame 42. Further, p'is due to the inverse depth d p, a projected point P in the target frame 42. Further, as shown in the equation (1), the tracking unit 103 calculates the photometric error Epj using the Huber norm. Further, the weighting coefficient W p is calculated in advance based on the brightness gradient of the pixels. For example, with respect to the gradient is larger pixel by reducing the value of the weight factor W p, it is possible to reduce noise. As a method for calculating the weighting coefficient W p , a known method can be applied. The luminance conversion hyperparameter a i, a j, b i , b j is a parameter for converting the luminance of the reference frame 41 and the subject frame 42. Luminance conversion hyperparameter a i, a j, b i , b j may be tuned manually for example by the administrator.

 また、以下の(2)式は、(1)式で用いられている点Pの投影点である点P´の制約条件である。点P´の算出には、参照フレーム41に描出された点Pを3次元空間上に点50aとして逆投影する逆投影関数と、3次元空間上の点50aを対象フレーム42に投影する投影関数とが用いられる。点Pから点50aまでの距離が、参照フレーム41における点50aの深度(d)である。 Further, the following equation (2) is a constraint condition of the point P'which is the projection point of the point P used in the equation (1). To calculate the point P', a back projection function that back-projects the point P drawn on the reference frame 41 as a point 50a on the three-dimensional space and a projection function that projects the point 50a on the three-dimensional space onto the target frame 42. And are used. The distance from the point P to the point 50a is the depth (d p ) of the point 50a in the reference frame 41.

Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002

 また、(2)式に含まれる係数Rは、撮像装置17の回転量を表す。また、係数tは、撮像装置17の並進量を表す。係数Rおよび係数tは、以下の制約条件(3)式によって、撮像装置17の相対位置によって定義される。 Further, the coefficient R included in the equation (2) represents the amount of rotation of the image pickup apparatus 17. The coefficient t represents the translational amount of the imaging device 17. The coefficient R and the coefficient t are defined by the relative position of the image pickup apparatus 17 according to the following constraint condition (3).

Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003

 トラッキング部103は、上述の(1)式~(3)式に示す参照フレームIと対象フレームI間の測光誤差Epjのモデルを解くことにより、対象フレームIを撮像した時点における撮像装置17の位置Tを特定する。なお、(3)式および図3に示す位置Tおよび位置Tは、撮像装置17の位置及び向きを含む。このように、トラッキング部103は、撮像装置17によって時系列に撮像される複数の撮像画像に対して、このようなトラッキング処理を繰り返し実行することにより、撮像装置17の位置および姿勢の変化を追跡する。 The tracking unit 103 takes an image at the time when the target frame I j is imaged by solving the model of the photometric error Epj between the reference frame I i and the target frame I j shown in the above equations (1) to (3). The position Tj of the device 17 is specified. The position Ti and the position T j shown in the equation (3) and FIG. 3 include the position and orientation of the image pickup apparatus 17. In this way, the tracking unit 103 tracks changes in the position and posture of the imaging device 17 by repeatedly executing such tracking processing on a plurality of captured images captured in time series by the imaging device 17. do.

 なお、トラッキングの手法は、上述の例に限定されるものではない。例えば、トラッキングの手法には、撮像画像上の特徴点を取得してからその特徴点のマッチング問題を解くことで各フレームの撮像時における撮像装置17の位置および姿勢を取得するIndirect手法(間接法)と、特徴点抽出の処理なしに撮像画像間の変換を直接推定することで各フレームの撮像時における撮像装置17の位置および姿勢を推定するDirect手法(直接法)がある。上述の例は、特徴点を投影することで撮像装置17の位置および姿勢の移動を算出したが、トラッキング部103は、直接法によるトラッキングを実行してもよい。また、トラッキング部103は、撮像画像だけではなく、IMUセンサ18の検出結果も加味して、撮像装置17の位置および向きを特定してもよい。 The tracking method is not limited to the above example. For example, the tracking method is an indirect method (indirect method) in which the position and orientation of the image pickup device 17 at the time of imaging of each frame are acquired by acquiring the feature points on the captured image and then solving the matching problem of the feature points. ) And the Direct method (direct method) in which the position and orientation of the image pickup apparatus 17 at the time of imaging of each frame are estimated by directly estimating the conversion between the captured images without the feature point extraction process. In the above example, the movement of the position and the posture of the image pickup apparatus 17 is calculated by projecting the feature points, but the tracking unit 103 may execute the tracking by the direct method. Further, the tracking unit 103 may specify the position and orientation of the image pickup device 17 in consideration of not only the captured image but also the detection result of the IMU sensor 18.

 トラッキング部103は、特定した撮像装置17の現在の位置および姿勢を、バンドル調整部104と、変換部102とに送出する。 The tracking unit 103 sends the current position and orientation of the specified imaging device 17 to the bundle adjustment unit 104 and the conversion unit 102.

 図2に戻り、バンドル調整部104は、バンドル調整処理によって、トラッキング部103によって特定された撮像装置17の位置および姿勢と、周囲の物体の位置情報とを補正する。バンドル調整部104は、処理結果として、情報処理装置1の自己位置と、地図情報とを出力する。 Returning to FIG. 2, the bundle adjustment unit 104 corrects the position and orientation of the image pickup device 17 specified by the tracking unit 103 and the position information of surrounding objects by the bundle adjustment process. The bundle adjustment unit 104 outputs the self-position of the information processing device 1 and the map information as the processing result.

 より詳細には、バンドル調整部104は、撮像装置17によって撮像された撮像画像のフレームごとの再投影誤差を最小化するバンドル調整部104に送出する。 More specifically, the bundle adjustment unit 104 sends the captured image captured by the image pickup device 17 to the bundle adjustment unit 104 that minimizes the reprojection error for each frame.

 より詳細には、バンドル調整部104は、周囲の環境の各点の世界座標点(3次元位置座標)、撮像装置17の位置および姿勢、および撮像装置17の内部パラメータを最適化することで各フレームの再投影誤差を最小化する。 More specifically, the bundle adjustment unit 104 optimizes the world coordinate points (three-dimensional position coordinates) of each point in the surrounding environment, the position and orientation of the image pickup device 17, and the internal parameters of the image pickup device 17, respectively. Minimize frame reprojection error.

 なお、撮像装置17の内部パラメータは、あらかじめカメラキャリブレーションされていればバンドル調整部104が更新しなくてもよい。撮像装置17の内部パラメータは、例えば、焦点距離および主点である。なお、バンドル調整においては、撮像装置17の位置および姿勢を、外部パラメータともいう。 Note that the internal parameters of the imaging device 17 do not have to be updated by the bundle adjustment unit 104 if the camera has been calibrated in advance. The internal parameters of the image pickup device 17 are, for example, the focal length and the principal point. In bundle adjustment, the position and orientation of the imaging device 17 are also referred to as external parameters.

 本実施形態のバンドル調整部104は、変換部102によってBIM情報から変換された周囲の構造物の位置を示す3次元座標を、上述の、周囲の環境の各点の世界座標点の初期値として採用する。 The bundle adjustment unit 104 of the present embodiment uses the three-dimensional coordinates indicating the positions of the surrounding structures converted from the BIM information by the conversion unit 102 as the initial values of the world coordinate points of each point in the surrounding environment as described above. adopt.

 また、バンドル調整部104は、トラッキング部103によって特定された撮像装置17の位置および姿勢の誤差を、このバンドル調整によって調整する。バンドル調整部104は、バンドル調整によって再投影誤差が最小化される各点の世界座標点、撮像装置17の位置および姿勢、および撮像装置17の内部パラメータを求めるため、結果として、誤差が低減された撮像装置17の位置および姿勢が算出される。また、バンドル調整後の世界座標点の集合が、地図情報となる。 Further, the bundle adjustment unit 104 adjusts the error of the position and orientation of the image pickup device 17 specified by the tracking unit 103 by this bundle adjustment. The bundle adjustment unit 104 obtains the world coordinate points of each point at which the reprojection error is minimized by the bundle adjustment, the position and orientation of the image pickup device 17, and the internal parameters of the image pickup device 17, so that the error is reduced as a result. The position and orientation of the image pickup device 17 are calculated. In addition, the set of world coordinate points after bundle adjustment becomes map information.

 図4は、第1の実施形態に係る情報処理装置1と周囲の物体との位置関係の一例を示すイメージ図である。図4に示す例では、情報処理装置1は、柱90a~90cが設置された建物9の中を移動するものとする。柱90a~90cは、物体の一例である。図4における距離dは、撮像装置17から、柱90cの情報処理装置1の側を向いた平面901上の点52までの距離である。 FIG. 4 is an image diagram showing an example of the positional relationship between the information processing device 1 and surrounding objects according to the first embodiment. In the example shown in FIG. 4, the information processing device 1 is assumed to move in the building 9 in which the pillars 90a to 90c are installed. Pillars 90a to 90c are examples of objects. The distance d in FIG. 4 is the distance from the image pickup device 17 to the point 52 on the plane 901 facing the information processing device 1 of the pillar 90c.

 例えば、変換部102によって、平面901上の点52の3次元座標の初期値が特定されているものとする。この場合、バンドル調整部104は、当該初期値から調整処理を開始し、トラッキング部103によって特定された撮像装置17の位置および姿勢と、撮像画像とに基づいて、自己位置および点52の位置の誤差を調整する。例えば、変換部102は、自己位置および点52の位置の誤差を調整することで、点52の3次元座標を補正し、より精度の高い3次元座標を得る。当該調整処理によって、バンドル調整部104は、自己位置と、点52の3次元座標とを推定する。 For example, it is assumed that the conversion unit 102 specifies the initial value of the three-dimensional coordinates of the point 52 on the plane 901. In this case, the bundle adjustment unit 104 starts the adjustment process from the initial value, and based on the position and orientation of the image pickup device 17 specified by the tracking unit 103 and the captured image, the self-position and the position of the point 52 Adjust the error. For example, the conversion unit 102 corrects the three-dimensional coordinates of the point 52 by adjusting the error between the self-position and the position of the point 52, and obtains the three-dimensional coordinates with higher accuracy. By the adjustment process, the bundle adjustment unit 104 estimates the self-position and the three-dimensional coordinates of the point 52.

 また、柱90cと撮像装置17との間に、什器等の他の物体が存在する場合がある。このような物体の情報はBIM情報には含まれていないが、撮像画像には描出されているため、バンドル調整部104は、BIM情報に基づく初期値を、バンドル調整によって変更することで、BIM情報に含まれない物体の位置も推定することができる。 In addition, there may be other objects such as fixtures between the pillar 90c and the image pickup device 17. Although the information of such an object is not included in the BIM information, it is drawn in the captured image. Therefore, the bundle adjustment unit 104 changes the initial value based on the BIM information by the bundle adjustment, thereby causing the BIM. The position of an object that is not included in the information can also be estimated.

 図5は、第1の実施形態に係るバンドル調整の一例を示すイメージ図である。例えば、バンドル調整部104は、以下の(4)式によって、図5に示す2枚の撮像画像43,44上の、3次元空間上の点52が投影された投影点401a,401bと、撮像画像43,44上に描出された点52に相当する特徴点402a,402bとの誤差を最小化するように、撮像装置17の位置と、点52の3次元座標とを推定する。以下、撮像画像43と撮像画像44とを区別する場合には、便宜的に、撮像画像43を第1の画像、撮像画像44を第2の画像という。また、本実施形態においては、撮像装置17の内部パラメータは、予めキャリブレーション済みであるものとし、(4)式における最適化対象のパラメータには含めていない。 FIG. 5 is an image diagram showing an example of bundle adjustment according to the first embodiment. For example, the bundle adjustment unit 104 captures the projection points 401a and 401b on which the points 52 in the three-dimensional space are projected on the two captured images 43 and 44 shown in FIG. 5 by the following equation (4). The position of the image pickup apparatus 17 and the three-dimensional coordinates of the point 52 are estimated so as to minimize the error from the feature points 402a and 402b corresponding to the points 52 drawn on the images 43 and 44. Hereinafter, when the captured image 43 and the captured image 44 are distinguished, the captured image 43 is referred to as a first image and the captured image 44 is referred to as a second image for convenience. Further, in the present embodiment, the internal parameters of the image pickup apparatus 17 are assumed to have been calibrated in advance, and are not included in the parameters to be optimized in the equation (4).

Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004

 また、上述の変換部102によって生成された初期値は、点52として示す世界座標点(3次元空間上の点X)の初期値として、(4)式で使用される。また、図5において撮像装置17の位置を表す基準点170a,170bと、点52とを結ぶ線を、光線束(Bundle)6a,6bという。また、変換部102によって初期値の範囲が設定された場合には、(4)式において、3次元空間上の点Xに、当該範囲に含まれる世界座標が設定されて演算が開始される。なお、初期値の範囲内の演算で誤差が最小化しない場合は、該初期値の範囲を超えて、最適な3次元空間上の点Xの値が求められてもよい。 The initial value generated by the conversion unit 102 described above, as an initial value in the world coordinate point (X i point in the three-dimensional space) shown as point 52, it is used in equation (4). Further, in FIG. 5, the lines connecting the reference points 170a and 170b representing the position of the image pickup apparatus 17 and the points 52 are referred to as ray bundles (Bundle) 6a and 6b. Also, when the range of the initial value is set by the converter 102, the equation (4), the point X i on the three-dimensional space, calculation is started world coordinate included in the range is set .. In the case where the error in the calculation of the range of the initial values is not minimized, beyond the scope of the initial value, the value of the point X i on the optimum three-dimensional space may be determined.

 また、バンドル調整部104は、BIM情報に基づいて、周囲の物体の平面または曲面の位置を推定し、周囲に存在する複数の点が平面上または曲面上に位置するという制約条件に基づいて、撮像装置17から周囲の物体までの距離を算出する。例えば、図4に示す点50b~50dは、全て、平面901上に存在する。このような場合、バンドル調整部104は、平面901を撮像範囲とする撮像画像に基づくバンドル調整処理を実行する際に、平面方程式による制約条件を課す。なお、以下、3次元空間上の個々の点50a~50dを特に限定しない場合には、単に点50という。 Further, the bundle adjusting unit 104 estimates the position of the plane or the curved surface of the surrounding object based on the BIM information, and based on the constraint condition that a plurality of points existing in the surroundings are located on the plane or the curved surface. The distance from the image pickup device 17 to the surrounding objects is calculated. For example, the points 50b to 50d shown in FIG. 4 all exist on the plane 901. In such a case, the bundle adjustment unit 104 imposes a constraint condition by the equation of a plane when executing the bundle adjustment process based on the captured image with the plane 901 as the imaging range. Hereinafter, when the individual points 50a to 50d in the three-dimensional space are not particularly limited, they are simply referred to as points 50.

 例えば、バンドル調整部104は、(5)式および(6)式に示すように、非線形関数f(x)およびg(x)による非線形最小二乗法による最適化問題を解くことにより、平面の存在を制約条件として点50の位置、および撮像装置17の位置を推定する。関数f(x)は、上述の(4)式に相当する。(5)式および(6)式の解法としては、ペナルティ法、または拡張ラグランジュ法が適用可能であるが、他の解法を採用してもよい。 For example, as shown in Eqs. (5) and (6), the bundle adjustment unit 104 solves the optimization problem by the nonlinear least squares method by the nonlinear functions f (x) and g (x), thereby presenting a plane. The position of the point 50 and the position of the imaging device 17 are estimated with the above as a constraint condition. The function f (x) corresponds to the above equation (4). As the solution of equations (5) and (6), the penalty method or the extended Lagrange method can be applied, but other solutions may be adopted.

Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005

 なお、情報処理装置1の周囲の構造物が平面を有する場合だけでなく、曲面を有する場合もある。例えば、柱90bの外面は曲面である。このような場合、バンドル調整部104は、BIM情報に基づいて、3次元空間上の点が曲面上にあるように曲面方程式による制約条件を課してもよい。 Note that the structure around the information processing device 1 may have a curved surface as well as a flat surface. For example, the outer surface of the pillar 90b is a curved surface. In such a case, the bundle adjustment unit 104 may impose a constraint condition by a curved surface equation so that a point on a three-dimensional space is on a curved surface based on BIM information.

 バンドル調整部104は、バンドル調整後の複数の点50の3次元座標に基づいて、3次元座標をもつ点群を、地図情報として生成する。また、バンドル調整部104は、地図情報に新たに点50を追加または削除することにより、地図情報を更新する。また、バンドル調整部104は、IMUセンサ18の検出結果もさらに加味して、自己位置の推定結果および地図情報を調整してもよい。 The bundle adjustment unit 104 generates a point cloud having the three-dimensional coordinates as map information based on the three-dimensional coordinates of the plurality of points 50 after the bundle adjustment. Further, the bundle adjustment unit 104 updates the map information by adding or deleting a new point 50 to the map information. In addition, the bundle adjustment unit 104 may adjust the self-position estimation result and the map information in consideration of the detection result of the IMU sensor 18.

 バンドル調整部104は、バンドル調整処理において、周囲の物体の位置を、3次元空間上の複数の点50の空間座標として算出し、算出した複数の点50の空間座標を、地図情報として出力する。本実施形態において、「出力」という場合は、補助記憶装置14への保存、または外部装置2への送信を含む。 In the bundle adjustment process, the bundle adjustment unit 104 calculates the positions of surrounding objects as the spatial coordinates of the plurality of points 50 in the three-dimensional space, and outputs the calculated spatial coordinates of the plurality of points 50 as map information. .. In the present embodiment, the term "output" includes storage in the auxiliary storage device 14 or transmission to the external device 2.

 例えば、バンドル調整部104は、推定した自己位置と、生成した地図情報とを、補助記憶装置14に保存する。また、バンドル調整部104は、推定した自己位置と、生成した地図情報とを、外部装置2に送信してもよい。 For example, the bundle adjustment unit 104 stores the estimated self-position and the generated map information in the auxiliary storage device 14. Further, the bundle adjustment unit 104 may transmit the estimated self-position and the generated map information to the external device 2.

 移動制御部105は、移動装置16を制御することにより、情報処理装置1を移動させる。例えば、移動制御部105は、補助記憶装置14に保存された地図情報と、現在の自己位置とに基づいて、移動可能な経路を探索する。移動制御部105は、探索結果に基づいて、移動装置16を制御する。 The movement control unit 105 moves the information processing device 1 by controlling the movement device 16. For example, the movement control unit 105 searches for a movable route based on the map information stored in the auxiliary storage device 14 and the current self-position. The movement control unit 105 controls the movement device 16 based on the search result.

 また、情報処理装置1が、超音波センサやレーザスキャナ等の測距センサを備える場合は、移動制御部105は、これらのセンサによる障害物等の検出結果に基づいて、障害物を回避する移動経路を生成してもよい。なお、情報処理装置1の移動制御の手法はこれらに限定させるものではなく、各種の自律移動の手法を適用することができる。 When the information processing device 1 is provided with a distance measuring sensor such as an ultrasonic sensor or a laser scanner, the movement control unit 105 moves to avoid obstacles based on the detection results of obstacles or the like by these sensors. A route may be generated. The movement control method of the information processing device 1 is not limited to these, and various autonomous movement methods can be applied.

 次に、以上のように構成された本実施形態の情報処理装置1で実行される自己位置推定および地図情報の生成処理の流れについて説明する。 Next, the flow of self-position estimation and map information generation processing executed by the information processing device 1 of the present embodiment configured as described above will be described.

 図6は、第1の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。 FIG. 6 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the first embodiment.

 まず、取得部101は、外部装置2からBIM情報を取得する(S1)。取得部101は、取得したBIM情報を補助記憶装置14に保存する。 First, the acquisition unit 101 acquires BIM information from the external device 2 (S1). The acquisition unit 101 stores the acquired BIM information in the auxiliary storage device 14.

 そして、移動制御部105は、移動装置16を制御することにより、情報処理装置1の移動を開始する(S2)。 Then, the movement control unit 105 starts the movement of the information processing device 1 by controlling the movement device 16 (S2).

 次に、取得部101は、撮像装置17から撮像画像を取得する。また、取得部101は、IMUセンサ18から、角速度および加速度等センシング結果を取得する(S3)。 Next, the acquisition unit 101 acquires an captured image from the imaging device 17. In addition, the acquisition unit 101 acquires sensing results such as angular velocity and acceleration from the IMU sensor 18 (S3).

 次に、トラッキング部103は、撮像画像に基づいて、撮像装置17の現在の位置および姿勢を特定する(S4)。 Next, the tracking unit 103 identifies the current position and orientation of the image pickup device 17 based on the captured image (S4).

 そして、変換部102は、トラッキング部103によって特定された撮像装置17の現在の位置および姿勢に基づいて、BIM情報から、構造物における点の3次元座標の初期値を生成する(S5)。 Then, the conversion unit 102 generates an initial value of the three-dimensional coordinates of the point in the structure from the BIM information based on the current position and orientation of the image pickup device 17 specified by the tracking unit 103 (S5).

 そして、バンドル調整部104は、バンドル調整処理を実行する(S6)。具体的には、BIM情報から生成された撮像装置17の周囲の構造物構造物における点の3次元座標の初期値と、トラッキング部103によって特定された撮像装置17の位置および姿勢と、撮像画像とに基づいて、撮像装置17から周囲の物体までの距離を算出するとともに、撮像装置17の位置および姿勢と、周囲の物体の3次元座標とを推定する。また、バンドル調整部104は、推定した周囲の物体の3次元座標に基づいて、地図情報を生成する。 Then, the bundle adjustment unit 104 executes the bundle adjustment process (S6). Specifically, the initial values of the three-dimensional coordinates of the points in the structure around the image pickup device 17 generated from the BIM information, the position and orientation of the image pickup device 17 specified by the tracking unit 103, and the captured image. Based on the above, the distance from the image pickup device 17 to the surrounding object is calculated, and the position and orientation of the image pickup device 17 and the three-dimensional coordinates of the surrounding object are estimated. In addition, the bundle adjustment unit 104 generates map information based on the estimated three-dimensional coordinates of surrounding objects.

 バンドル調整部104は、推定した自己位置と、生成した地図情報とを、例えば、補助記憶装置14に保存する。 The bundle adjustment unit 104 stores the estimated self-position and the generated map information in, for example, the auxiliary storage device 14.

 また、移動制御部105は、補助記憶装置14に保存された地図情報と現在の自己位置とに基づいて移動経路を探索し、探索結果に基づいて移動装置16を制御することにより、情報処理装置1を移動させる。 Further, the movement control unit 105 searches for a movement route based on the map information stored in the auxiliary storage device 14 and the current self-position, and controls the movement device 16 based on the search result to obtain an information processing device. Move 1

 そして、移動制御部105は、情報処理装置1の移動を終了するか否かを判定する(S7)。移動制御部105は、例えば、予め定められた終了地点に情報処理装置1が到着した場合に、移動制御部105は、情報処理装置1の移動を終了すると判定する。なお、移動の終了の判定条件は特に限定されるものではなく、例えば、移動制御部105は、通信ネットワーク3を介して、外部から移動の終了の指示が入力された場合に、情報処理装置1の移動を終了すると判定してもよい。 Then, the movement control unit 105 determines whether or not to end the movement of the information processing device 1 (S7). The movement control unit 105 determines, for example, that when the information processing device 1 arrives at a predetermined end point, the movement control unit 105 ends the movement of the information processing device 1. The conditions for determining the end of movement are not particularly limited. For example, when the movement control unit 105 receives an instruction to end movement from the outside via the communication network 3, the information processing device 1 It may be determined that the movement of is completed.

 移動制御部105が移動を終了すると判定しない場合(S7“No”)、S3の処理に戻り、S3~S7の処理を繰り返す。また、移動制御部105が移動を終了すると判定した場合(S7“Yes”)、このフローチャートの処理は終了する。 If the movement control unit 105 does not determine that the movement is completed (S7 "No"), the process returns to S3 and the processes S3 to S7 are repeated. Further, when the movement control unit 105 determines that the movement is completed (S7 “Yes”), the processing of this flowchart ends.

 このように、本実施形態の情報処理装置1は、BIM情報と、情報処理装置1の周囲を撮像した撮像画像とに基づいて、自己位置の推定と地図情報の生成とを実行する。このため、本実施形態の情報処理装置1によれば、BIM情報を自己位置の推定と地図情報の生成の処理に利用することにより、自己位置の推定および地図情報の精度を向上させることができる。 As described above, the information processing device 1 of the present embodiment executes self-position estimation and map information generation based on the BIM information and the captured image captured around the information processing device 1. Therefore, according to the information processing apparatus 1 of the present embodiment, by using the BIM information for the processing of self-position estimation and map information generation, it is possible to improve the self-position estimation and the accuracy of the map information. ..

 例えば、本実施形態の情報処理装置1は、BIM情報を、SLAM処理部120による自己位置の推定処理、または地図情報の生成処理の少なくとも一方の入力値に変換し、該入力値に基づいて、自己位置の推定と地図情報の生成とを実行することにより、撮像画像等の周辺検知結果のみに基づいて自己位置の推定および地図情報の生成をするよりも、自己位置の推定および地図情報の精度を向上させることができる。 For example, the information processing apparatus 1 of the present embodiment converts BIM information into at least one input value of self-position estimation processing by SLAM processing unit 120 or map information generation processing, and based on the input value, By executing self-position estimation and map information generation, self-position estimation and map information accuracy are higher than self-position estimation and map information generation based only on peripheral detection results such as captured images. Can be improved.

 また、本実施形態の情報処理装置1は、異なる時刻に撮像された複数の撮像画像を追跡することによって撮像装置17の位置および姿勢を特定し、BIM情報に基づく撮像装置17から周囲の物体における点の3次元座標の初期値または初期値の範囲を、撮像画像の追跡によって特定した撮像装置17の位置および姿勢と、撮像画像とに基づいて変更することにより、撮像装置17から周囲の物体までの距離と、自己位置と、周囲の物体の位置とを算出する。より具体的には、本実施形態の情報処理装置1は、BIM情報に基づく撮像装置17から周囲の物体までの距離を、バンドル調整処理における初期値または初期値の範囲として使用する。このため、本実施形態の本実施形態の情報処理装置1によれば、BIM情報に基づく撮像装置17から周囲の物体までの距離を、撮像装置17から周囲の物体における点の3次元座標の距離の初期値または初期値の範囲として使用することにより、バンドル調整処理における演算量を低減することができる。 Further, the information processing device 1 of the present embodiment identifies the position and orientation of the image pickup device 17 by tracking a plurality of captured images captured at different times, and from the image pickup device 17 based on the BIM information to a surrounding object. By changing the initial value or the range of the initial value of the three-dimensional coordinates of the point based on the position and orientation of the image pickup device 17 identified by tracking the captured image and the captured image, from the image pickup device 17 to the surrounding object. Calculate the distance, self-position, and position of surrounding objects. More specifically, the information processing device 1 of the present embodiment uses the distance from the image pickup device 17 based on the BIM information to the surrounding object as the initial value or the range of the initial value in the bundle adjustment process. Therefore, according to the information processing device 1 of the present embodiment of the present embodiment, the distance from the image pickup device 17 based on the BIM information to the surrounding object is the distance from the image pickup device 17 to the surrounding object in three-dimensional coordinates. By using it as the initial value or the range of the initial value of, the amount of calculation in the bundle adjustment process can be reduced.

 例えば、比較例として、一般的なバンドル調整処理においては、3次元空間上の点の3次元座標の初期値を無限と仮定する場合がある。このような場合、例えば、3次元空間上の点と撮像装置との間の距離が1mなのか、1000mなのかが特定されない状態でバンドル調整処理が開始するため、算出結果が収束までの演算量が増大する場合がある。これに対して、本実施形態の情報処理装置1は、BIM情報に基づく初期値を用いるため、少ない演算量で処理結果を収束させることができる。 For example, as a comparative example, in a general bundle adjustment process, the initial value of the three-dimensional coordinates of a point in the three-dimensional space may be assumed to be infinite. In such a case, for example, the bundle adjustment process starts without specifying whether the distance between the point in the three-dimensional space and the image pickup device is 1 m or 1000 m, so that the amount of calculation until the calculation result converges. May increase. On the other hand, since the information processing apparatus 1 of the present embodiment uses the initial value based on the BIM information, the processing result can be converged with a small amount of calculation.

 また、本実施形態の情報処理装置1は、BIM情報に基づいて、周囲の物体の平面または曲面の位置を推定し、周囲に存在する複数の点が平面上または曲面上に位置するという制約条件に基づいて、撮像装置17から周囲の物体までの距離を算出する。このため、本実施形態の情報処理装置1によれば、同一の平面又は曲面上に存在する複数の点の位置を別個に求める場合よりも、演算量を低減することができる。 Further, the information processing apparatus 1 of the present embodiment estimates the position of the plane or the curved surface of the surrounding object based on the BIM information, and the constraint condition that a plurality of points existing in the surroundings are located on the plane or the curved surface. The distance from the image pickup apparatus 17 to the surrounding objects is calculated based on the above. Therefore, according to the information processing apparatus 1 of the present embodiment, the amount of calculation can be reduced as compared with the case where the positions of a plurality of points existing on the same plane or curved surface are separately obtained.

 また、本実施形態の情報処理装置1は、周囲の物体の位置を、3次元空間上の複数の点50の空間座標として算出し、算出した複数の点50の空間座標を、地図情報として出力する。本実施形態の情報処理装置1によれば、BIM情報を用いたバンドル調整処理によって算出した周囲の物体の位置を地図情報として出力することにより、より高精度な地図情報を提供することができる。 Further, the information processing device 1 of the present embodiment calculates the positions of surrounding objects as the spatial coordinates of the plurality of points 50 in the three-dimensional space, and outputs the calculated spatial coordinates of the plurality of points 50 as map information. do. According to the information processing device 1 of the present embodiment, more accurate map information can be provided by outputting the positions of surrounding objects calculated by the bundle adjustment process using BIM information as map information.

 なお、情報処理装置1は、監視、警備、清掃、または荷物の配送等の機能を備えるロボット等であってもよい。この場合、情報処理装置1は、推定した自己位置および地図情報に基づいて建物9を移動することにより、種々の機能を実現する。また、情報処理装置1によって生成された地図情報は、情報処理装置1自体の移動経路の生成に利用されるだけではなく、遠隔地から建物9を監視または管理する際に使用されてもよい。また、情報処理装置1によって生成された地図情報は、情報処理装置1以外のロボットまたはドローンの移動経路の生成に利用されてもよい。 The information processing device 1 may be a robot or the like having functions such as monitoring, security, cleaning, and delivery of luggage. In this case, the information processing device 1 realizes various functions by moving the building 9 based on the estimated self-position and map information. Further, the map information generated by the information processing device 1 may be used not only for generating the movement route of the information processing device 1 itself, but also for monitoring or managing the building 9 from a remote location. Further, the map information generated by the information processing device 1 may be used to generate a movement route of a robot or drone other than the information processing device 1.

 なお、撮像装置17はステレオカメラに限定されるものではない。例えば、撮像装置17は、RGB(Red Green Blue)カメラと3次元計測カメラ(Depthカメラ)とを有するRGB-Dカメラ、または単眼カメラ等であってもよい。 The image pickup device 17 is not limited to the stereo camera. For example, the image pickup device 17 may be an RGB-D camera having an RGB (Red Green Blue) camera and a three-dimensional measurement camera (Dept camera), a monocular camera, or the like.

 また、情報処理装置1が備えるセンサは、IMUセンサ18に限定されるものではなく、ジャイロセンサ、加速度センサ、磁気センサ等が個別に設けられてもよい。 Further, the sensor included in the information processing device 1 is not limited to the IMU sensor 18, and a gyro sensor, an acceleration sensor, a magnetic sensor, or the like may be individually provided.

 また、本実施形態では、SLAM処理部120は、撮像画像を用いた画像SLAM(Visual SLAM)を実行するものとしたが、撮像画像を用いないSLAMが採用されてもよい。例えば、情報処理装置1は、撮像装置17ではなく、Lidar(Light Detection and Ranging、またはLaser Imaging Detection and Ranging)等によって周囲の構造物を検出してもよい。この場合、SLAM処理部120は、Lidarによる測距結果に基づいて、情報処理装置1の位置および向きを特定してもよい。 Further, in the present embodiment, the SLAM processing unit 120 executes the image SLAM (Visual SLAM) using the captured image, but the SLAM that does not use the captured image may be adopted. For example, the information processing device 1 may detect surrounding structures by Lidar (Light Detection and Ringing or Laser Imaging Detection and Ringing) or the like instead of the image pickup device 17. In this case, the SLAM processing unit 120 may specify the position and orientation of the information processing device 1 based on the distance measurement result by Lidar.

 また、本実施形態では、SLAM処理部120は3次元の地図情報を生成するとしたが、2次元の地図情報を生成するものとしてもよい。 Further, in the present embodiment, the SLAM processing unit 120 is supposed to generate three-dimensional map information, but it may be possible to generate two-dimensional map information.

 また、本実施形態で例示した(1)式~(6)式は一例であり、トラッキング処理またはバンドル調整処理で用いられる数式は、これらに限定されるものではない。また、バンドル調整部104は、(5)式および(6)式による制約条件を課さずに、(4)式によってバンドル調整を行ってもよい。また、トラッキング処理においては、近傍パターンNを使用せずに、トラッキング処理を実行してもよい。 Further, the equations (1) to (6) illustrated in the present embodiment are examples, and the mathematical expressions used in the tracking process or the bundle adjustment process are not limited to these. Further, the bundle adjustment unit 104 may perform bundle adjustment according to the equation (4) without imposing the constraint conditions according to the equations (5) and (6). Further, in the tracking process, without using the proximity pattern N p, it may perform the tracking process.

 また、SLAM処理部120は、トラッキング処理またはバンドル調整処理以外の手法によって、自己位置の推定および地図情報の生成をしてもよい。例えば、本実施形態においてはトラッキング部103を特定部の一例としたが、トラッキング以外の手法によって情報処理装置1の位置および姿勢の変化を特定する手法を採用してもよい。 Further, the SLAM processing unit 120 may estimate its own position and generate map information by a method other than tracking processing or bundle adjustment processing. For example, in the present embodiment, the tracking unit 103 is used as an example of the specific unit, but a method of specifying a change in the position and posture of the information processing device 1 by a method other than tracking may be adopted.

 また、SLAM処理には、本実施形態で例示した処理以外に、自己位置推定または地図情報の精度を向上させるための各種の処理が追加されてもよい。例えば、SLAM処理部120は、ループ閉じ込み処理等をさらに実行してよい。 Further, in addition to the processes exemplified in this embodiment, various processes for improving the accuracy of self-position estimation or map information may be added to the SLAM process. For example, the SLAM processing unit 120 may further execute a loop closing process or the like.

 なお、上述した本実施形態における情報処理装置1の一部又は全部は、ハードウェアで構成されていてもよいし、CPU、又はGPU等が実行するソフトウェア(プログラム)の情報処理で構成されてもよい。ソフトウェアの情報処理で構成される場合には、上述した実施形態における各装置の少なくとも一部の機能を実現するソフトウェアを、フレキシブルディスク、CD-ROM(Compact Disc-Read Only Memory)、又はUSBメモリ等の非一時的な記憶媒体(非一時的なコンピュータ可読媒体)に収納し、コンピュータに読み込ませることにより、ソフトウェアの情報処理を実行してもよい。また、通信ネットワークを介して当該ソフトウェアがダウンロードされてもよい。さらに、ソフトウェアがASIC、又はFPGA等の回路に実装されることにより、情報処理がハードウェアにより実行されてもよい。 A part or all of the information processing device 1 in the above-described embodiment may be composed of hardware, or may be composed of information processing of software (program) executed by a CPU, GPU, or the like. good. When it is composed of software information processing, software that realizes at least a part of the functions of each device in the above-described embodiment is a flexible disk, a CD-ROM (Compact Disc-Read Only Memory), a USB memory, or the like. The software may process information by storing it in a non-temporary storage medium (non-temporary computer-readable medium) and loading it into a computer. In addition, the software may be downloaded via a communication network. Further, information processing may be executed by hardware by implementing the software in a circuit such as an ASIC or FPGA.

 ソフトウェアを収納する記憶媒体の種類は限定されるものではない。記憶媒体は、磁気ディスク、又は光ディスク等の着脱可能なものに限定されず、ハードディスク、又はメモリ等の固定型の記憶媒体であってもよい。また、記憶媒体は、コンピュータ内部に備えられてもよいし、コンピュータ外部に備えられてもよい。 The type of storage medium that stores the software is not limited. The storage medium is not limited to a removable one such as a magnetic disk or an optical disk, and may be a fixed storage medium such as a hard disk or a memory. Further, the storage medium may be provided inside the computer or may be provided outside the computer.

 また、図1においては、情報処理装置1は、各構成要素を一つ備えているが、同じ構成要素を複数備えていてもよい。また、図1では、1台の情報処理装置1が示されているが、ソフトウェアが複数台のコンピュータにインストールされて、当該複数台のコンピュータそれぞれがソフトウェアの同一の又は異なる一部の処理を実行してもよい。この場合、コンピュータそれぞれがネットワークインタフェース13等を介して通信して処理を実行する分散コンピューティングの形態であってもよい。つまり、上述した実施形態における情報処理装置1は、1又は複数の記憶装置に記憶された命令を1台又は複数台のコンピュータが実行することで機能を実現するシステムとして構成されてもよい。また、端末から送信された情報をクラウド上に設けられた1台又は複数台のコンピュータで処理し、この処理結果を端末に送信するような構成であってもよい。 Further, in FIG. 1, the information processing device 1 includes one component, but may include a plurality of the same components. Further, although one information processing device 1 is shown in FIG. 1, software is installed on a plurality of computers, and each of the plurality of computers executes the same or different part of the processing of the software. You may. In this case, it may be a form of distributed computing in which each computer communicates via a network interface 13 or the like to execute processing. That is, the information processing device 1 in the above-described embodiment may be configured as a system that realizes a function by executing instructions stored in one or a plurality of storage devices by one or a plurality of computers. Further, the information transmitted from the terminal may be processed by one or a plurality of computers provided on the cloud, and the processing result may be transmitted to the terminal.

 上述した実施形態における情報処理装置1の各種演算は、1又は複数のプロセッサを用いて、又は、ネットワークを介した複数台のコンピュータを用いて、並列処理で実行されてもよい。また、各種演算が、プロセッサ内に複数ある演算コアに振り分けられて、並列処理で実行されてもよい。また、本開示の処理、手段等の一部又は全部は、ネットワークを介して情報処理装置1通信可能なクラウド上に設けられたプロセッサ及び記憶装置の少なくとも一方により実行されてもよい。このように、上述した実施形態における各装置は、1台又は複数台のコンピュータによる並列コンピューティングの形態であってもよい。 Various operations of the information processing device 1 in the above-described embodiment may be executed in parallel processing by using one or a plurality of processors or by using a plurality of computers via a network. Further, various operations may be distributed to a plurality of arithmetic cores in the processor and executed in parallel processing. In addition, some or all of the processes, means, etc. of the present disclosure may be executed by at least one of a processor and a storage device provided on the cloud capable of communicating with the information processing device 1 via the network. As described above, each device in the above-described embodiment may be in the form of parallel computing by one or a plurality of computers.

 上述した実施形態における情報処理装置1は、1又は複数のプロセッサ11により実現されてもよい。ここで、プロセッサ11は、1チップ上に配置された1又は複数の電子回路を指してもよいし、2つ以上のチップあるいは2つ以上のデバイス上に配置された1又は複数の電子回路を指してもよい。複数の電子回路を用いる場合、各電子回路は有線又は無線により通信してもよい。 The information processing device 1 in the above-described embodiment may be realized by one or a plurality of processors 11. Here, the processor 11 may refer to one or more electronic circuits arranged on one chip, or may refer to one or more electronic circuits arranged on two or more chips or two or more devices. You may point. When a plurality of electronic circuits are used, each electronic circuit may communicate by wire or wirelessly.

 記憶装置(メモリ)1つに対して、複数のプロセッサが接続(結合)されてもよいし、単数のプロセッサが接続されてもよい。プロセッサ1つに対して、複数の記憶装置(メモリ)が接続(結合)されてもよい。上述した実施形態における情報処理装置1が、少なくとも1つの記憶装置(メモリ)とこの少なくとも1つの記憶装置(メモリ)に接続(結合)される複数のプロセッサで構成される場合、複数のプロセッサのうち少なくとも1つのプロセッサが、少なくとも1つの記憶装置(メモリ)に接続(結合)される構成を含んでもよい。また、複数台のコンピュータに含まれる記憶装置(メモリ)とプロセッサによって、この構成が実現されてもよい。さらに、記憶装置(メモリ)がプロセッサと一体になっている構成(例えば、L1キャッシュ、L2キャッシュを含むキャッシュメモリ)を含んでもよい。 Multiple processors may be connected (combined) to one storage device (memory), or a single processor may be connected. A plurality of storage devices (memory) may be connected (combined) to one processor. When the information processing device 1 in the above-described embodiment is composed of at least one storage device (memory) and a plurality of processors connected (combined) to the at least one storage device (memory), among the plurality of processors At least one processor may include a configuration in which it is connected (combined) to at least one storage device (memory). Further, this configuration may be realized by a storage device (memory) and a processor included in a plurality of computers. Further, a configuration in which the storage device (memory) is integrated with the processor (for example, a cache memory including an L1 cache and an L2 cache) may be included.

 また、外部装置2は、サーバ装置に限定されるものではない。また、外部装置2は、クラウド環境に設けられてもよい。また、外部装置2を、請求の範囲における情報処理装置の一例としてもよい。 Further, the external device 2 is not limited to the server device. Further, the external device 2 may be provided in a cloud environment. Further, the external device 2 may be used as an example of the information processing device in the claims.

 また、他の一例として、外部装置2は、入力装置であってもよい。また、デバイスインタフェース15は、移動装置16、撮像装置17、IMUセンサ18だけではなく、入力装置と接続するものとしてもよい。入力装置は、例えば、カメラ、マイクロフォン、モーションキャプチャ、各種センサ、キーボード、マウス、又はタッチパネル等のデバイスであり、取得した情報を情報処理装置1に与える。また、パーソナルコンピュータ、タブレット端末、又はスマートフォン等の入力部とメモリとプロセッサを備えるデバイスであってもよい。 Further, as another example, the external device 2 may be an input device. Further, the device interface 15 may be connected not only to the mobile device 16, the image pickup device 17, and the IMU sensor 18, but also to the input device. The input device is, for example, a device such as a camera, a microphone, a motion capture, various sensors, a keyboard, a mouse, or a touch panel, and gives the acquired information to the information processing device 1. Further, it may be a device including an input unit, a memory and a processor such as a personal computer, a tablet terminal, or a smartphone.

 また、他の一例として、外部装置2は、出力装置でもよい。また、デバイスインタフェース15は、出力装置と接続するものとしてもよい。出力装置は、例えば、LCD(Liquid Crystal Display)、CRT(Cathode Ray Tube)、PDP(Plasma Display Panel)、又は有機EL(Electro Luminescence)パネル等の表示装置であってもよいし、音声等を出力するスピーカ等であってもよい。また、パーソナルコンピュータ、タブレット端末、又はスマートフォン等の出力部とメモリとプロセッサを備えるデバイスであってもよい。 Further, as another example, the external device 2 may be an output device. Further, the device interface 15 may be connected to the output device. The output device may be, for example, a display device such as an LCD (Liquid Crystal Display), a CRT (Cathode Ray Tube), a PDP (Plasma Display Panel), or an organic EL (Electro Luminescence) panel, and outputs audio or the like. It may be a speaker or the like. Further, it may be a device including an output unit such as a personal computer, a tablet terminal, or a smartphone, a memory, and a processor.

 また、他の一例として、外部装置2は、記憶装置(メモリ)であってもよい。また、デバイスインタフェース15は、記憶装置(メモリ)と接続するものとしてもよい。例えば、外部装置2はネットワークストレージ等であってもよく、デバイスインタフェース15にはHDD等のストレージが接続するものとしてもよい。 Further, as another example, the external device 2 may be a storage device (memory). Further, the device interface 15 may be connected to a storage device (memory). For example, the external device 2 may be a network storage or the like, and a storage such as an HDD may be connected to the device interface 15.

 また、外部装置2、または、デバイスインタフェース15に接続する外部装置は、上述した実施形態における情報処理装置1の構成要素の一部の機能を有する装置でもよい。つまり、情報処理装置1は、外部装置2または、デバイスインタフェース15に接続する外部装置の処理結果の一部又は全部を送信又は受信してもよい。 Further, the external device 2 or the external device connected to the device interface 15 may be a device having some functions of the components of the information processing device 1 in the above-described embodiment. That is, the information processing device 1 may transmit or receive a part or all of the processing results of the external device 2 or the external device connected to the device interface 15.

 また、自己位置推定処理および地図情報の生成処理を実行している間、情報処理装置1は、外部装置2と通信ネットワーク3を介して常時接続していても良いが、これに限定されるものではない。例えば、情報処理装置1は、自己位置推定処理および地図情報の生成処理を実行している間、外部装置2との接続をオフラインにしていても良い。 Further, while the self-position estimation process and the map information generation process are being executed, the information processing device 1 may be constantly connected to the external device 2 via the communication network 3, but is limited to this. is not it. For example, the information processing device 1 may take the connection with the external device 2 offline while executing the self-position estimation process and the map information generation process.

 なお、本実施形態では、変換部102がBIM情報から特定した建物9に含まれる点の世界座標を、周囲の物体の位置に関する情報の一例としたが、周囲の物体の位置に関する情報はこれに限定されるものではない。 In the present embodiment, the world coordinates of the points included in the building 9 specified by the conversion unit 102 from the BIM information are used as an example of the information regarding the positions of the surrounding objects, but the information regarding the positions of the surrounding objects is included in this. It is not limited.

 例えば、変換部102は、BIM情報と、撮像装置17の位置および姿勢とに基づいて、情報処理装置1と周囲の物体との距離を示す情報を生成しても良い。撮像装置17の位置および姿勢は、例えば、トラッキング部103によって特定された位置および姿勢を採用することができる。 For example, the conversion unit 102 may generate information indicating the distance between the information processing device 1 and a surrounding object based on the BIM information and the position and orientation of the image pickup device 17. As the position and orientation of the image pickup apparatus 17, for example, the position and orientation specified by the tracking unit 103 can be adopted.

 この場合、(4)式における3次元空間上の点Xと、撮像装置17の位置(R,t)、(Rj+1,tj+1)との間の距離が、変換部102によって生成された距離を示す情報によって特定される。この場合、バンドル調整部104は、演算の開始時には、変換部102によって生成された距離を示す情報によって特定された3次元空間上の点X撮像装置17の位置(R,t)、(Rj+1,tj+1)との間の距離と一致するように、各パラメータに値を設定する。なお、この手法においても、バンドル調整による誤差の調整の結果求められる3次元空間上の点Xおよび撮像装置17の位置は、BIM情報から求められた結果とは異なっても良い。 In this case, generated by (4) and X i point in the three-dimensional space in the expression, position of the imaging device 17 (R j, t j), the distance between the (R j + 1, t j + 1) is, the conversion unit 102 It is identified by information that indicates the distance taken. In this case, at the start of the calculation, the bundle adjustment unit 104 determines the position (R j , t j ) of the point X i imaging device 17 in the three-dimensional space specified by the information indicating the distance generated by the conversion unit 102. Set a value for each parameter so that it matches the distance between (R j + 1 , t j + 1). Also in this method, the position of a point X i and the imaging device 17 in a three-dimensional space obtained result of the adjustment of the error by bundle adjustment may be different from the result obtained from the BIM data.

 また、例えば、変換部102は、BIM情報から建物9の寸法を特定し、該寸法を、周囲の物体の位置に関する情報の一例としても良い。この場合、バンドル調整部104は、(4)式における3次元空間上の点Xと、撮像装置17の位置(R,t)、(Rj+1,tj+1)とが、BIM情報から特定された建物9の寸法の範囲内に含まれるように、バンドル調整を実行する。例えば、BIM情報から特定された建物9の全長が50m、幅が50mである場合、建物9の構造物に含まれる3次元空間上の点Xは、撮像装置17の位置から、50m以上離れることは無いため、3次元空間上の点Xおよび撮像装置17の位置(R,t)、(Rj+1,tj+1)が取り得る値の範囲が限定される。このように、バンドル調整部104は、変換部102がBIM情報から変換した周囲の物体の位置に関する情報を用いてバンドル調整することで、演算量を低減することができる。 Further, for example, the conversion unit 102 may specify the dimension of the building 9 from the BIM information, and the dimension may be used as an example of information regarding the position of a surrounding object. In this case, bundle adjustment unit 104 (4) and X i point in the three-dimensional space in the expression, position of the imaging device 17 (R j, t j), although the (R j + 1, t j + 1), the BIM information Bundle adjustment is performed to be within the dimensions of the identified building 9. For example, if the total length of the building 9 identified from BIM information 50m, a width of 50m, X i point in the three-dimensional space are included in the building structure 9 moves away from the position of the imaging device 17, 50 m or more Therefore, the range of values that can be taken by the points X i in the three-dimensional space and the positions (R j , t j ) and (R j + 1 , t j + 1) of the imaging device 17 is limited. In this way, the bundle adjustment unit 104 can reduce the amount of calculation by performing bundle adjustment using the information regarding the position of the surrounding object converted from the BIM information by the conversion unit 102.

(第2の実施形態)
 上述の第1の実施形態においては、環境情報は、BIM情報または3D CADデータ等の3次元設計情報であった。この第2の実施形態では、環境情報は、少なくとも建物9における人物の入退出情報、または撮像画像における人物の画像認識結果のいずれか1つを含むものとする。環境情報は、建物9における人物の入退出情報と、撮像画像における人物の画像認識結果の両方を含むものとしても良いし、いずれか一方を含むものとしても良い。
(Second Embodiment)
In the first embodiment described above, the environmental information is three-dimensional design information such as BIM information or 3D CAD data. In this second embodiment, the environmental information includes at least one of the entry / exit information of the person in the building 9 and the image recognition result of the person in the captured image. The environmental information may include both the entry / exit information of the person in the building 9 and the image recognition result of the person in the captured image, or may include either one.

 図7は、第2の実施形態に係る情報処理装置1が備える機能の一例を示すブロック図である。図7に示すように、本実施形態の情報処理装置1は、取得部1101と、変換部1102と、SLAM処理部1120と、移動制御部105とを備える。また、SLAM処理部1120は、トラッキング部1103と、バンドル調整部1104とを含む。また、変換部1102は、初期値生成部106と、マスク情報生成部107とを含む。 FIG. 7 is a block diagram showing an example of the functions included in the information processing device 1 according to the second embodiment. As shown in FIG. 7, the information processing apparatus 1 of the present embodiment includes an acquisition unit 1101, a conversion unit 1102, a SLAM processing unit 1120, and a movement control unit 105. Further, the SLAM processing unit 1120 includes a tracking unit 1103 and a bundle adjusting unit 1104. Further, the conversion unit 1102 includes an initial value generation unit 106 and a mask information generation unit 107.

 移動制御部105は、第1の実施形態と同様の機能を備える。 The movement control unit 105 has the same function as that of the first embodiment.

 本実施形態の取得部1101は、第1の実施形態と同様の機能を備えた上で、建物9における人物の入退出情報を取得する。 The acquisition unit 1101 of the present embodiment has the same function as that of the first embodiment, and acquires the entry / exit information of the person in the building 9.

 入退出情報は、建物9の部屋ごとまたはフロアごとに入退出した人物の人数と、入退出の時刻とを表す情報である。例えば、建物9の部屋またはフロアの出入り口に、入退出を検出するセンサが設置され、センサによる検出結果が外部装置2に送信されているものとする。この場合、取得部1101は、外部装置2から入退出情報を取得する。なお、人物の入退出の検出方法はセンサに限定されるものではない。例えば、入退出情報は、カードリーダによるセキュリティカードの読み取り記録や、建物9に設置された監視カメラの撮像画像からの人物の検出結果であってもよい。 The entry / exit information is information indicating the number of people entering / exiting each room or floor of the building 9 and the time of entering / exiting. For example, it is assumed that a sensor for detecting entry / exit is installed at the entrance / exit of a room or floor of the building 9, and the detection result by the sensor is transmitted to the external device 2. In this case, the acquisition unit 1101 acquires the entry / exit information from the external device 2. The method of detecting the entry / exit of a person is not limited to the sensor. For example, the entry / exit information may be a reading record of a security card by a card reader or a detection result of a person from an image captured by a surveillance camera installed in a building 9.

 取得部1101は、取得した入退出情報を、補助記憶装置14に保存する。 The acquisition unit 1101 stores the acquired entry / exit information in the auxiliary storage device 14.

 本実施形態の変換部1102は、第1の実施形態と同様の機能を備えた上で、環境情報に基づいて、情報処理装置1が位置する建物9において地図情報の生成の対象から除外する領域を表すマスク情報を生成する。 The conversion unit 1102 of the present embodiment has the same functions as those of the first embodiment, and is excluded from the target of map information generation in the building 9 where the information processing device 1 is located based on the environmental information. Generates mask information that represents.

 本実施形態において、環境情報は、少なくとも入退出情報、または人物の画像認識結果の一方を含むものとする。人物の画像認識結果は、撮像装置17によって撮像された撮像画像から、画像処理によって人物が認識された結果である。 In the present embodiment, the environmental information includes at least one of the entry / exit information and the image recognition result of the person. The image recognition result of a person is a result of recognizing a person by image processing from the captured image captured by the image pickup device 17.

 具体的には、本実施形態の環境情報は、入退出情報と画像認識結果との両方と、第1の実施形態と同様の3次元設計情報とを含む。 Specifically, the environmental information of the present embodiment includes both the entry / exit information and the image recognition result, and the same three-dimensional design information as that of the first embodiment.

 より詳細には、変換部1102は、初期値生成部106と、マスク情報生成部107とを含む。初期値生成部106は、第1の実施形態における変換部102と同様の機能を備える。 More specifically, the conversion unit 1102 includes an initial value generation unit 106 and a mask information generation unit 107. The initial value generation unit 106 has the same function as the conversion unit 102 in the first embodiment.

 また、マスク情報は、地図情報の生成の対象から除外される領域を表す情報である。 The mask information is information representing an area excluded from the target of map information generation.

 マスク情報生成部107は、入退出情報または人物の画像認識結果に基づいて、建物9において人物が位置する領域を判定し、判定した該領域を、地図情報の生成の対象から除外する領域とする。 The mask information generation unit 107 determines the area where the person is located in the building 9 based on the entry / exit information or the image recognition result of the person, and sets the determined area as an area to be excluded from the target of map information generation. ..

 マスク情報生成部107は、撮像装置17によって撮像された撮像画像から、画像処理によって、人物を認識する。また、マスク情報生成部107は、画像処理において、撮像画像に描出された物体が人物か否かの判定が困難な場合、入退出情報に基づいて、当該撮像画像が撮像された部屋またはフロアに、当該撮像画像の撮像時刻に人物が存在したか否かを判定する。マスク情報生成部107は、当該撮像画像が撮像された部屋またはフロアに、当該撮像画像の撮像時刻に人物が存在したと判定した場合、人物が存在しないと判定した場合よりも、撮像画像に描出された物体が人物である可能性を高く推定する。 The mask information generation unit 107 recognizes a person by image processing from the captured image captured by the imaging device 17. Further, when it is difficult to determine whether or not the object depicted in the captured image is a person in the image processing, the mask information generation unit 107 may move to the room or floor where the captured image is captured based on the entry / exit information. , It is determined whether or not a person exists at the time when the captured image is captured. When the mask information generation unit 107 determines that a person exists in the room or floor where the captured image is captured at the time when the captured image is captured, the mask information generation unit 107 draws the captured image on the captured image rather than determining that the person does not exist. It is highly probable that the object is a person.

 図8は、第2の実施形態に係る情報処理装置1と周囲の物体との位置関係の一例を示すイメージ図である。図8に示す例では、建物9において、情報処理装置1が存在する部屋に、人物70が存在する。柱90a~90cなどとは異なり、人物70は移動するため、地図情報に人物70の存在を含めると、地図情報の精度が低下する可能性がある。 FIG. 8 is an image diagram showing an example of the positional relationship between the information processing device 1 and surrounding objects according to the second embodiment. In the example shown in FIG. 8, in the building 9, the person 70 exists in the room where the information processing device 1 exists. Unlike the pillars 90a to 90c, the person 70 moves, so if the presence of the person 70 is included in the map information, the accuracy of the map information may decrease.

 マスク情報生成部107は、人物70が存在する領域80を表すマスク情報を生成する。マスク情報は、例えば、人物70が存在する領域80を3次元座標で表す。 The mask information generation unit 107 generates mask information representing the area 80 in which the person 70 exists. The mask information represents, for example, the area 80 in which the person 70 exists in three-dimensional coordinates.

 なお、本実施形態においては、マスク情報生成部107は、入退出情報と人物の画像認識結果の両方を使用してマスク情報を生成しているが、いずれか一方のみに基づいてマスク情報を生成してもよい。 In the present embodiment, the mask information generation unit 107 generates mask information using both the entry / exit information and the image recognition result of the person, but the mask information is generated based on only one of them. You may.

 また、マスク情報生成部107は、撮像装置17によって撮像された撮像画像から、車両等の移動体や、台車等の一時的に建物9内に存在する機器等の物体を、画像認識によって検出してもよい。この場合、マスク情報生成部107は、これらの物体が存在すると判定した領域を、地図情報の生成の対象から除外する領域とする。 Further, the mask information generation unit 107 detects a moving body such as a vehicle or an object such as a device temporarily existing in the building 9 such as a dolly from the image captured by the image pickup device 17 by image recognition. You may. In this case, the mask information generation unit 107 sets the area where it is determined that these objects exist as an area to be excluded from the target of map information generation.

 マスク情報生成部107は、地図情報の生成の対象から除外する領域を表すマスク情報を生成し、SLAM処理部1120に送出する。 The mask information generation unit 107 generates mask information representing an area to be excluded from the target of map information generation, and sends it to the SLAM processing unit 1120.

 図7に戻り、本実施形態のSLAM処理部1120は、第1の実施形態の機能を備えた上で、マスク情報に相当する領域については、地図情報を生成しない。 Returning to FIG. 7, the SLAM processing unit 1120 of the present embodiment has the functions of the first embodiment and does not generate map information for the area corresponding to the mask information.

 より詳細には、本実施形態のSLAM処理部1120のトラッキング部1103は、第1の実施形態と同様の(1)式によってトラッキング処理を実行する際に、近傍パターンNがマスク情報によって表される領域が描出される画像領域である場合、近傍パターンNに対して、マスク値を乗算する。マスク値は、例えば“0”または“1”であるが、これに限定されるものではない。なお、マスクを適用する手法はこれに限定されるものではなく、他の手法を採用してもよい。 More specifically, the tracking section 1103 of the SLAM processor 1120 of the present embodiment, when performing the tracking process by the same equation (1) in the first embodiment, near the pattern N p is represented by the mask information If the area that is the image area to be rendered, with respect to the vicinity pattern N p, is multiplied by a mask value. The mask value is, for example, “0” or “1”, but is not limited thereto. The method of applying the mask is not limited to this, and other methods may be adopted.

 また、本実施形態のSLAM処理部1120のバンドル調整部1104は、第1の実施形態の機能を備えた上で、マスク情報に相当する領域については、バンドル調整の対象外とする。 Further, the bundle adjustment unit 1104 of the SLAM processing unit 1120 of the present embodiment has the functions of the first embodiment, and the area corresponding to the mask information is excluded from the bundle adjustment.

 次に、以上のように構成された本実施形態の情報処理装置1で実行される自己位置推定および地図情報の生成処理の流れについて説明する。 Next, the flow of self-position estimation and map information generation processing executed by the information processing device 1 of the present embodiment configured as described above will be described.

 図9は、第2の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。 FIG. 9 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the second embodiment.

 S1のBIM情報の取得の処理は、図6で説明した第1の実施形態と同様である。次に、取得部1101は、入退出情報取得を取得する(S21)。取得部1101は、取得した入退出情報を、補助記憶装置14に保存する。 The process of acquiring the BIM information in S1 is the same as that of the first embodiment described with reference to FIG. Next, the acquisition unit 1101 acquires the entry / exit information acquisition (S21). The acquisition unit 1101 stores the acquired entry / exit information in the auxiliary storage device 14.

 S2の情報処理装置1の移動の開始の処理から、S3の撮像画像および角速度および加速度等センシング結果の取得の処理は、第1の実施形態と同様である。 The process of acquiring the captured image of S3 and the sensing results such as angular velocity and acceleration from the process of starting the movement of the information processing device 1 of S2 is the same as that of the first embodiment.

 次に、本実施形態の変換部1102のマスク情報生成部107は、入退出情報または人物70の画像認識結果に基づいて、地図情報の生成の対象から除外する領域を表すマスク情報を生成する(S22)。 Next, the mask information generation unit 107 of the conversion unit 1102 of the present embodiment generates mask information representing an area to be excluded from the target of map information generation based on the entry / exit information or the image recognition result of the person 70 ( S22).

 そして、本実施形態のSLAM処理部1120のトラッキング部1103は、撮像画像に基づいて、撮像装置17の現在の位置および姿勢を特定する(S4)。この際、トラッキング部1103は、マスク情報に相当する領域については、トラッキング処理の対象外とする。 Then, the tracking unit 1103 of the SLAM processing unit 1120 of the present embodiment identifies the current position and orientation of the image pickup device 17 based on the captured image (S4). At this time, the tracking unit 1103 excludes the area corresponding to the mask information from the tracking process.

 そして、本実施形態の変換部1102の初期値生成部106は、トラッキング部103によって特定された撮像装置17の現在の位置および姿勢に基づいて、BIM情報から、撮像装置17の周囲の構造物における点の3次元座標の初期値を生成する(S5)。なお、マスク情報に相当する領域については、地図情報の生成対象外であるため、初期値生成部106は、マスク情報に相当する領域内の構造物における点の3次元座標の初期値は生成しない。 Then, the initial value generation unit 106 of the conversion unit 1102 of the present embodiment is based on the current position and orientation of the image pickup device 17 specified by the tracking unit 103, and is based on the BIM information in the structure around the image pickup device 17. The initial value of the three-dimensional coordinates of the point is generated (S5). Since the area corresponding to the mask information is not subject to the generation of map information, the initial value generation unit 106 does not generate the initial value of the three-dimensional coordinates of the points in the structure in the area corresponding to the mask information. ..

 次に、バンドル調整部1104は、バンドル調整処理を実行する(S6)。バンドル調整部1104は、マスク情報に相当する領域については、バンドル調整の対象外とする。 Next, the bundle adjustment unit 1104 executes the bundle adjustment process (S6). The bundle adjustment unit 1104 excludes the area corresponding to the mask information from the bundle adjustment.

 S7の情報処理装置1の移動を終了するか否かの判定処理は、第1の実施形態と同様である。本実施形態においては、移動制御部105が移動を終了すると判定しない場合(S7“No”)、取得部1101は、最新の入退出情報取得を再度取得し(S23)、S3の処理に戻る。 The process of determining whether or not to end the movement of the information processing device 1 in S7 is the same as that of the first embodiment. In the present embodiment, when the movement control unit 105 does not determine that the movement is completed (S7 “No”), the acquisition unit 1101 acquires the latest entry / exit information acquisition again (S23), and returns to the process of S3.

 また、移動制御部105が移動を終了すると判定した場合(S7“Yes”)、このフローチャートの処理は終了する。 Further, when the movement control unit 105 determines that the movement is completed (S7 “Yes”), the processing of this flowchart ends.

 このように、本実施形態の情報処理装置1は、環境情報に基づいて、建物9において地図情報の生成の対象から除外する領域を表すマスク情報を生成し、マスク情報に相当する領域については、地図情報を生成しない。このため、本実施形態の情報処理装置1によれば、一時的に存在する人物70や物体等、地図情報の精度を低下させる可能性のあるものを除外することができるため、地図情報の精度を向上させることができる。 As described above, the information processing device 1 of the present embodiment generates mask information representing the area excluded from the target of map information generation in the building 9 based on the environmental information, and the area corresponding to the mask information is the area corresponding to the mask information. Does not generate map information. Therefore, according to the information processing device 1 of the present embodiment, it is possible to exclude things such as a person 70 and an object that temporarily exist, which may reduce the accuracy of the map information, so that the accuracy of the map information can be reduced. Can be improved.

 例えば、本実施形態においては、環境情報は、建物9における人物70の入退出情報、または情報処理装置1に搭載された撮像装置17によって撮像された撮像画像における人物70の画像認識結果を含み、本実施形態の情報処理装置1は、入退出情報または人物70の画像認識結果に基づいて、建物9において人物70が位置する領域を判定し、判定した該領域を、地図情報の生成の対象から除外する領域とする。 For example, in the present embodiment, the environmental information includes the entry / exit information of the person 70 in the building 9, or the image recognition result of the person 70 in the image captured by the image pickup device 17 mounted on the information processing device 1. The information processing device 1 of the present embodiment determines an area where the person 70 is located in the building 9 based on the entry / exit information or the image recognition result of the person 70, and determines the determined area from the target of generating map information. The area to be excluded.

 例えば、建物9が作業現場等である場合、建物9の中には作業者等の人物70が存在する場合がある。このような場合、情報処理装置1は、作業者を地図情報に反映しないことにより、地図情報の精度を向上させることができる。また、このような構成により、本実施形態の情報処理装置1は、周囲の環境が人物等によって変化する場合においてもロバストに処理を実行することができる。 For example, when the building 9 is a work site or the like, a person 70 such as a worker may exist in the building 9. In such a case, the information processing device 1 can improve the accuracy of the map information by not reflecting the worker in the map information. Further, with such a configuration, the information processing apparatus 1 of the present embodiment can robustly execute processing even when the surrounding environment changes depending on a person or the like.

 なお、本実施形態においては、第1の実施形態と同様に、環境情報に基づいてバンドル調整処理とするものとしたが、第2の実施形態の情報処理装置1は、第1の実施形態の機能を全て備えなくともよい。例えば、情報処理装置1は、環境情報をマスク情報の生成のためだけに使用し、バンドル調整処理には使用しないものとしてもよい。当該構成を採用する場合、環境情報は、3次元設計情報を含まなくともよい。 In the present embodiment, the bundle adjustment process is performed based on the environmental information as in the first embodiment, but the information processing device 1 of the second embodiment is the same as that of the first embodiment. It does not have to have all the functions. For example, the information processing device 1 may use the environmental information only for generating the mask information and may not use it for the bundle adjustment process. When the configuration is adopted, the environmental information does not have to include the three-dimensional design information.

 また、本実施形態においては、情報処理装置1が移動しながらリアルタイムに地図情報を生成する際に、マスク情報を利用する例を説明したが、マスク情報の利用のタイミングはこれに限定されるものではない。例えば、情報処理装置1によって生成された地図情報を、後から更新する際に、過去の時刻における入退出情報等に基づくマスク情報が利用されてもよい。 Further, in the present embodiment, an example in which the mask information is used when the information processing device 1 is moving to generate map information in real time has been described, but the timing of using the mask information is limited to this. is not it. For example, when updating the map information generated by the information processing device 1 later, mask information based on entry / exit information at a past time may be used.

(第3の実施形態)
 上述の第1、第2の実施形態においては、BIM情報等の3次元設計情報における3次元座標系とSLAM座標系とは予め対応関係が定義されているものとした。この第3の実施形態では、情報処理装置1が移動している途中で、3次元設計情報における3次元座標系とSLAM座標系との対応関係を調整する。
(Third Embodiment)
In the first and second embodiments described above, it is assumed that the correspondence relationship between the three-dimensional coordinate system and the SLAM coordinate system in the three-dimensional design information such as BIM information is defined in advance. In this third embodiment, the correspondence between the three-dimensional coordinate system and the SLAM coordinate system in the three-dimensional design information is adjusted while the information processing device 1 is moving.

 図10は、第3の実施形態に係る情報処理装置1が備える機能の一例を示すブロック図である。図10に示すように、本実施形態の情報処理装置1は、取得部1101と、マーカ検出部108と、キャリブレーション部109と、変換部2102と、SLAM処理部2120と、移動制御部105とを備える。また、SLAM処理部2120は、トラッキング部2103と、バンドル調整部1104とを含む。また、変換部2102は、初期値生成部1106と、マスク情報生成部1107とを含む。 FIG. 10 is a block diagram showing an example of the functions included in the information processing device 1 according to the third embodiment. As shown in FIG. 10, the information processing apparatus 1 of the present embodiment includes an acquisition unit 1101, a marker detection unit 108, a calibration unit 109, a conversion unit 2102, a SLAM processing unit 2120, and a movement control unit 105. To be equipped. Further, the SLAM processing unit 2120 includes a tracking unit 2103 and a bundle adjustment unit 1104. Further, the conversion unit 2102 includes an initial value generation unit 1106 and a mask information generation unit 1107.

 移動制御部105は、第1、第2の実施形態と同様の機能を備える。取得部1101は、第2の実施形態と同様の機能を備える。 The movement control unit 105 has the same functions as those of the first and second embodiments. The acquisition unit 1101 has the same function as that of the second embodiment.

 マーカ検出部108は、撮像画像から、ARマーカを検出する。ARマーカは、例えば当該ARマーカが記載された位置を表す3次元座標の情報をもつ。当該3次元座標は、BIM情報における座標系と整合性がとれている。例えば、ARマーカは、BIM情報における座標系で、当該ARマーカが設置された位置を表す。 The marker detection unit 108 detects the AR marker from the captured image. The AR marker has, for example, information on three-dimensional coordinates representing the position where the AR marker is described. The three-dimensional coordinates are consistent with the coordinate system in the BIM information. For example, the AR marker is a coordinate system in BIM information and represents a position where the AR marker is installed.

 ARマーカは、本実施形態における指標情報の一例である。ARマーカは、建物9の通路に沿った壁や柱等に設置されているものとする。ARマーカは、具体的には、例えばQRコード(登録商標)等であるが、これに限定されるものではない。また、ARマーカの数は特に限定されるものではないが、1つの建物9あたり複数のARマーカが設置されているものとする。また、マーカ検出部108は、本実施形態における指標検出部の一例である。マーカ検出部108は、ARマーカの検出結果を、キャリブレーション部109に送出する。 The AR marker is an example of index information in this embodiment. It is assumed that the AR marker is installed on a wall, a pillar, or the like along the passage of the building 9. Specifically, the AR marker is, for example, a QR code (registered trademark) or the like, but is not limited thereto. The number of AR markers is not particularly limited, but it is assumed that a plurality of AR markers are installed per building 9. Further, the marker detection unit 108 is an example of the index detection unit in the present embodiment. The marker detection unit 108 sends the detection result of the AR marker to the calibration unit 109.

 キャリブレーション部109は、ARマーカの検出結果に基づいて、内部的に保持している自己位置を表す座標系を、BIM情報の座標系と整合するように調整する。キャリブレーション部109は、本実施形態における座標調整部の一例である。 Based on the detection result of the AR marker, the calibration unit 109 adjusts the coordinate system representing the self-position held internally so as to match the coordinate system of the BIM information. The calibration unit 109 is an example of the coordinate adjustment unit in this embodiment.

 例えば、SLAM処理部2120によって推定された自己位置の変化の軌跡が、補助記憶装置14に保存されるが、情報処理装置1の移動に伴って、自己位置の誤差が蓄積される場合がある。このような場合、推定された自己位置と、BIM情報における建物9の立体モデル上の位置との対応関係に差異が生じる。キャリブレーション部109は、マーカ検出部108によって検出されたARマーカが表す3次元座標に基づいて、現在の情報処理装置1の位置を調整することにより、このような誤差の蓄積を解消する。 For example, the locus of change in self-position estimated by the SLAM processing unit 2120 is stored in the auxiliary storage device 14, but an error in self-position may be accumulated as the information processing device 1 moves. In such a case, there is a difference in the correspondence between the estimated self-position and the position of the building 9 on the three-dimensional model in the BIM information. The calibration unit 109 eliminates the accumulation of such errors by adjusting the current position of the information processing device 1 based on the three-dimensional coordinates represented by the AR marker detected by the marker detection unit 108.

 キャリブレーション部109は、キャリブレーション結果を変換部2102に送出する。例えば、キャリブレーション部109は、自己位置を補正するための変換行列を、変換部2102に送出する。 The calibration unit 109 sends the calibration result to the conversion unit 2102. For example, the calibration unit 109 sends a conversion matrix for correcting the self-position to the conversion unit 2102.

 また、本実施形態の変換部2102は第1、第2の実施形態と同様の機能を備えた上で、キャリブレーション部109によって調整された自己位置に基づいて、環境情報を、SLAM処理部2120による自己位置の推定処理、または地図情報の生成処理の入力値に変換する。 Further, the conversion unit 2102 of the present embodiment has the same functions as those of the first and second embodiments, and also provides environmental information to the SLAM processing unit 2120 based on the self-position adjusted by the calibration unit 109. Converts to the input value of the self-position estimation process or the map information generation process.

 より詳細には、初期値生成部1106は、第2の実施形態と同様の機能を備えた上で、キャリブレーション部109によって調整された自己位置に基づいて、トラッキング部2103によって特定された撮像装置17の位置と、BIM情報における撮像装置17の位置とを位置合わせし、位置合わせ後の撮像装置の位置および姿勢に基づいてBIM情報から、バンドル調整処理の初期値または初期値の範囲を表す入力値を生成する。 More specifically, the initial value generation unit 1106 has the same function as that of the second embodiment, and is an imaging device specified by the tracking unit 2103 based on the self-position adjusted by the calibration unit 109. An input that aligns the position of 17 and the position of the image pickup device 17 in the BIM information, and represents the initial value or the range of the initial value of the bundle adjustment process from the BIM information based on the position and orientation of the image pickup device after the alignment. Generate a value.

 例えば、初期値生成部1106は、キャリブレーション部109によって生成された変換行列によって、情報処理装置1の建物9における位置を示す3次元座標を、BIM情報における建物9の3次元モデル上で特定した上で、バンドル調整処理の初期値または初期値の範囲を表す入力値を生成する。 For example, the initial value generation unit 1106 specified the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the transformation matrix generated by the calibration unit 109. Above, the initial value of the bundle adjustment process or the input value representing the range of the initial value is generated.

 また、マスク情報生成部1107は、第2の実施形態と同様の機能を備えた上で、キャリブレーション部109によって調整された自己位置に基づいて、マスク情報を生成する。 Further, the mask information generation unit 1107 has the same function as that of the second embodiment, and generates mask information based on the self-position adjusted by the calibration unit 109.

 例えば、マスク情報生成部1107は、キャリブレーション部109によって生成された変換行列によって、情報処理装置1の建物9における位置を示す3次元座標を、BIM情報における建物9の3次元モデル上で特定した上で、マスク情報を生成する。 For example, the mask information generation unit 1107 identifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the conversion matrix generated by the calibration unit 109. Above, generate mask information.

 SLAM処理部2120のバンドル調整部1104は、第1、第2の実施形態と同様の機能を備えた上で、変換部2102がキャリブレーション部109によって調整された自己位置に基づいて生成したバンドル調整処理の初期値または初期値の範囲を、バンドル調整に使用する。 The bundle adjustment unit 1104 of the SLAM processing unit 2120 has the same functions as those of the first and second embodiments, and the bundle adjustment unit 2102 generates the bundle adjustment based on the self-position adjusted by the calibration unit 109. Use the initial value of processing or the range of initial values for bundle adjustment.

 次に、以上のように構成された本実施形態の情報処理装置1で実行される自己位置推定および地図情報の生成処理の流れについて説明する。 Next, the flow of self-position estimation and map information generation processing executed by the information processing device 1 of the present embodiment configured as described above will be described.

 図11は、第3の実施形態に係る自己位置推定および地図情報の生成処理の流れの一例を示すフローチャートである。 FIG. 11 is a flowchart showing an example of the flow of self-position estimation and map information generation processing according to the third embodiment.

 S1のBIM情報の取得の処理から、S3の撮像画像およびセンシング結果の取得の処理までは、第2の実施形態と同様である。 The process from the process of acquiring the BIM information of S1 to the process of acquiring the captured image and the sensing result of S3 is the same as that of the second embodiment.

 マーカ検出部108は、撮像画像から、ARマーカを検出する(S31)。マーカ検出部108は、ARマーカの検出結果を、キャリブレーション部109に送出する。 The marker detection unit 108 detects the AR marker from the captured image (S31). The marker detection unit 108 sends the detection result of the AR marker to the calibration unit 109.

 キャリブレーション部109は、ARマーカの検出結果に基づいて、キャリブレーション処理を実行する(S32)。例えば、キャリブレーション部109は、BIM情報の座標系における自己位置を調整するための変換行列を生成する。キャリブレーション部109は、生成した変換行列を変換部2102に送出する。 The calibration unit 109 executes the calibration process based on the detection result of the AR marker (S32). For example, the calibration unit 109 generates a transformation matrix for adjusting the self-position of the BIM information in the coordinate system. The calibration unit 109 sends the generated transformation matrix to the conversion unit 2102.

 マスク情報生成部1107は、キャリブレーション部109によって生成された変換行列によって、情報処理装置1の建物9における位置を示す3次元座標を、BIM情報における建物9の3次元モデル上で特定した上で、マスク情報を生成する(S22)。 The mask information generation unit 1107 identifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the conversion matrix generated by the calibration unit 109. , Generates mask information (S22).

 S4のトラッキング処理は、第1、第2の実施形態と同様であるが、当該処理においても、キャリブレーション部109によるキャリブレーション結果を用いてもよい。 The tracking process of S4 is the same as that of the first and second embodiments, but the calibration result by the calibration unit 109 may also be used in the process.

 例えば、本実施形態において、キャリブレーション部109は、キャリブレーション結果を変換部2102に送出するとしたが、さらに、SLAM処理部2120にキャリブレーション結果を送出してもよい。当該構成を採用する場合、SLAM処理部2120のトラッキング部2103は、キャリブレーション結果に基づく3次元座標を用いて、トラッキング処理を実行する。 For example, in the present embodiment, the calibration unit 109 sends the calibration result to the conversion unit 2102, but may further send the calibration result to the SLAM processing unit 2120. When adopting this configuration, the tracking unit 2103 of the SLAM processing unit 2120 executes the tracking process using the three-dimensional coordinates based on the calibration result.

 そして、初期値生成部1106は、キャリブレーション部109によって生成された変換行列によって、情報処理装置1の建物9における位置を示す3次元座標を、BIM情報における建物9の3次元モデル上で特定した上で、バンドル調整処理の初期値または初期値の範囲を表す入力値を生成する(S5)。 Then, the initial value generation unit 1106 specifies the three-dimensional coordinates indicating the position of the information processing device 1 in the building 9 on the three-dimensional model of the building 9 in the BIM information by the transformation matrix generated by the calibration unit 109. Above, the initial value of the bundle adjustment process or the input value representing the range of the initial value is generated (S5).

 S6のバンドル調整処理においても、バンドル調整部1104とは、キャリブレーション結果に基づく3次元座標を用いて、トラッキング処理およびバンドル調整処理を実行してもよい。 Also in the bundle adjustment process of S6, the bundle adjustment unit 1104 may execute the tracking process and the bundle adjustment process using the three-dimensional coordinates based on the calibration result.

 S7の情報処理装置1の移動を終了するか否かの判定処理と、S23の入退出情報の取得の処理は、第2の実施形態と同様である。 The process of determining whether or not to end the movement of the information processing device 1 in S7 and the process of acquiring the entry / exit information in S23 are the same as those in the second embodiment.

 このように、本実施形態の情報処理装置1は、撮像画像等の検知結果から、BIM情報における座標系で位置が表された指標情報を検出し、当該指標情報によって調整された座標系に基づいて、環境情報を、SLAM処理部2120による自己位置の推定処理、または地図情報の生成処理の入力値に変換する。このため、本実施形態の情報処理装置1によれば、BIM情報と、情報処理装置1の内部的なSLAM座標系との誤差を低減し、より高精度に自己位置推定および地図情報の生成をすることができる。 As described above, the information processing apparatus 1 of the present embodiment detects the index information whose position is represented by the coordinate system in the BIM information from the detection result of the captured image or the like, and is based on the coordinate system adjusted by the index information. Then, the environmental information is converted into an input value of the self-position estimation process by the SLAM processing unit 2120 or the map information generation process. Therefore, according to the information processing device 1 of the present embodiment, the error between the BIM information and the internal SLAM coordinate system of the information processing device 1 is reduced, and self-position estimation and map information generation can be performed with higher accuracy. can do.

 なお、本実施形態では、指標情報としてARマーカを例示したが、指標情報はこれに限定されるものではない。例えば、指標情報は、Lidarまたは各種センサで捕捉可能な標識等でもよいし、ビーコン等であってもよい。 In the present embodiment, the AR marker is illustrated as the index information, but the index information is not limited to this. For example, the index information may be a sign or the like that can be captured by Lidar or various sensors, or may be a beacon or the like.

 また、本実施形態においては、情報処理装置1が第1の実施形態と第2の実施形態の両方の機能を備えるものとして記載したが、本実施形態の情報処理装置1は、第1、第2の実施形態の機能を全て備えなくともよい。例えば、情報処理装置1は、環境情報をバンドル調整またはマスク情報の生成のいずれか一方のためだけに使用してもよい。また、環境情報は、3次元設計情報、入退出情報、または人物の画像認識結果のいずれかを含むものであればよい。 Further, in the present embodiment, the information processing device 1 is described as having the functions of both the first embodiment and the second embodiment, but the information processing device 1 of the present embodiment is described as having the functions of both the first embodiment and the second embodiment. It is not necessary to have all the functions of the second embodiment. For example, the information processing apparatus 1 may use the environmental information only for either bundle adjustment or generation of mask information. Further, the environmental information may include any of three-dimensional design information, entry / exit information, and image recognition result of a person.

(第4の実施形態)
 上述の第2の実施形態では、情報処理装置1は、撮像画像を人物の認識に使用していたが、撮像画像の用途はこれに限定されるものではない。この第4の実施形態では、情報処理装置1は、撮像画像に描出された物体の認識結果に基づいて、撮像画像をセグメンテーション(領域分割)し、セグメンテーション結果に基づいてSLAM処理を行う。
(Fourth Embodiment)
In the second embodiment described above, the information processing apparatus 1 uses the captured image for recognizing a person, but the use of the captured image is not limited to this. In this fourth embodiment, the information processing apparatus 1 segmentes the captured image based on the recognition result of the object drawn on the captured image, and performs SLAM processing based on the segmentation result.

 本実施形態の情報処理装置1は、取得部101と、変換部102と、SLAM処理部120と、移動制御部105とを備える。 The information processing device 1 of the present embodiment includes an acquisition unit 101, a conversion unit 102, a SLAM processing unit 120, and a movement control unit 105.

 取得部101は、第1の実施形態と同様の機能を備える。具体的には、取得部101は、デバイスインタフェース15を介して、撮像装置17から撮像画像を取得する。 The acquisition unit 101 has the same function as that of the first embodiment. Specifically, the acquisition unit 101 acquires an captured image from the imaging device 17 via the device interface 15.

 変換部102は、第1の実施形態と同様の機能を備えた上で、取得部101によって取得された撮像画像を、撮像画像に描出された物体の認識結果に基づいてセグメンテーションする。 The conversion unit 102 has the same function as that of the first embodiment, and then segments the captured image acquired by the acquisition unit 101 based on the recognition result of the object drawn on the captured image.

 図12は、第4の実施形態に係る撮像画像60のセグメンテーションの一例を示す図である。図12の左側に示すように、撮像画像60には、情報処理装置1の周囲の環境が描出される。変換部102は、撮像画像60から、物体が描出された画像領域と、各物体の種別とを認識する。本実施形態においては、物体の認識結果は、物体が描出された画像領域の2次元座標と、各物体の種別とが対応付けられた情報とする。 FIG. 12 is a diagram showing an example of segmentation of the captured image 60 according to the fourth embodiment. As shown on the left side of FIG. 12, the captured image 60 depicts the environment around the information processing device 1. The conversion unit 102 recognizes the image area in which the object is drawn and the type of each object from the captured image 60. In the present embodiment, the recognition result of the object is information in which the two-dimensional coordinates of the image area in which the object is drawn and the type of each object are associated with each other.

 本実施形態においては、環境情報は、少なくとも撮像画像60を含むものとする。あるいは、撮像画像60自体ではなく、撮像画像60のセグメンテーション結果を、環境情報の一例としても良い。 In the present embodiment, the environmental information includes at least the captured image 60. Alternatively, the segmentation result of the captured image 60 may be used as an example of the environmental information instead of the captured image 60 itself.

 変換部102は、撮像画像60に描出された物体を認識する。なお、第1の実施形態と同様に、本実施形態においても、「物体」という場合は、壁や柱等の構造物、什器、家具、移動体、仮設物、および人物等を含むものとする。 The conversion unit 102 recognizes the object depicted in the captured image 60. As in the first embodiment, in the present embodiment as well, the term "object" includes structures such as walls and pillars, furniture, furniture, moving objects, temporary objects, people, and the like.

 変換部102は、例えば、ニューラルネットワーク等によって構成された学習済みモデルに撮像画像60を入力することにより、撮像画像60に描出された個々の物体を認識する。図12に示す例では、撮像画像60には、人物70と、箱75a,75bと、柱90と、壁91と、床92とが描出されている。変換部102は、これらの物体を認識する。“人物”、“箱”、“柱”、“壁”、および“床”は、物体の種別の一例である。なお、人物70の認識と、その他の物体の認識とは、別々に実行されても良い。 The conversion unit 102 recognizes the individual objects drawn on the captured image 60 by inputting the captured image 60 into the trained model configured by, for example, a neural network or the like. In the example shown in FIG. 12, the captured image 60 depicts a person 70, boxes 75a and 75b, a pillar 90, a wall 91, and a floor 92. The conversion unit 102 recognizes these objects. "People," "boxes," "pillars," "walls," and "floors" are examples of object types. The recognition of the person 70 and the recognition of other objects may be executed separately.

 変換部102は、物体の認識結果に基づいて、撮像画像60をセグメンテーションする。図12の右側には、撮像画像60のセグメンテーション結果61を示す。図12に示す例では、変換部102は、柱90および壁91が描出された画像領域を領域A1、床92が描出された画像領域を領域A2、箱75a,75bが描出された画像領域を領域A3、人物70が描出された画像領域を領域A4として、撮像画像60を4つにセグメンテーションする。なお、分割単位は図12に示す例に限定されるものではない。以下、領域A1~A4を特に区別しない場合には、単に領域Aという。 The conversion unit 102 segmentes the captured image 60 based on the recognition result of the object. On the right side of FIG. 12, the segmentation result 61 of the captured image 60 is shown. In the example shown in FIG. 12, the conversion unit 102 sets the image area where the pillar 90 and the wall 91 are drawn as the area A1, the image area where the floor 92 is drawn, the area A2, and the image area where the boxes 75a and 75b are drawn. The image region in which the area A3 and the person 70 are drawn is set as the area A4, and the captured image 60 is segmented into four. The division unit is not limited to the example shown in FIG. Hereinafter, when the regions A1 to A4 are not particularly distinguished, they are simply referred to as regions A.

 また、変換部102は、認識した物体、つまり、人物70、箱75a,75b、柱90、壁91、および床92を、常設された物体か否かによって分類する。例えば、柱90、壁91、および床92は建物9の一部であるため、常設された物体である。また、人物70、および箱75a,75bは、常設されていない物体である。各物体が常設されているか否かは、例えば、学習済みモデルによって判別される。 Further, the conversion unit 102 classifies the recognized objects, that is, the person 70, the boxes 75a and 75b, the pillar 90, the wall 91, and the floor 92 according to whether or not they are permanent objects. For example, the pillar 90, the wall 91, and the floor 92 are permanent objects because they are part of the building 9. The person 70 and the boxes 75a and 75b are non-permanent objects. Whether or not each object is permanently installed is determined by, for example, a trained model.

 常設された物体とは、一度設置されると、設置位置から移動しない物体である。例えば、上記の柱90、壁91、および床92のように、建物9の一部である物体は、基本的に移動しないため、常設された物体とする。また、常設されていない物体とは、設置位置から移動する可能性の高い物体である。例えば、人物70や、カートやフォークリフト等の移動体、一時的に設置された什器、および荷物の箱75a,75b等は、常設されていない物体とする。 A permanently installed object is an object that does not move from the installation position once it is installed. For example, an object that is a part of the building 9, such as the pillar 90, the wall 91, and the floor 92, is basically a permanent object because it does not move. An object that is not permanently installed is an object that is likely to move from the installation position. For example, a person 70, a moving body such as a cart or a forklift, temporarily installed fixtures, luggage boxes 75a, 75b, etc. are non-permanent objects.

 また、変換部102は、セグメンテーションした領域A1~A4と、各領域に描出された物体が常設された物体か否かを対応付ける。本実施形態においては、セグメンテーションした領域A1~A4と、各領域に描出された物体が常設された物体か否かを対応付けた情報を、セグメンテーション結果という。変換部102は、セグメンテーション結果を、SLAM処理部120に送出する。 Further, the conversion unit 102 associates the segmented areas A1 to A4 with whether or not the object drawn in each area is a permanently installed object. In the present embodiment, the information in which the segmented areas A1 to A4 are associated with whether or not the object drawn in each area is a permanently installed object is referred to as a segmentation result. The conversion unit 102 sends the segmentation result to the SLAM processing unit 120.

 なお、上記では、物体認識と、該物体認識の結果に基づくセグメンテーションの処理とを分けて説明したが、これらの処理は統合されても良い。例えば、撮像画像60の入力を受けた場合に該撮像画像60のセグメンテーション結果を出力する学習済みモデルを採用しても良い。この場合、変換部102は、学習済みモデルに撮像画像60を入力し、該学習済みモデルから出力されるセグメンテーション結果を得る。 In the above, the object recognition and the segmentation process based on the result of the object recognition have been described separately, but these processes may be integrated. For example, a trained model that outputs the segmentation result of the captured image 60 when the captured image 60 is input may be adopted. In this case, the conversion unit 102 inputs the captured image 60 into the trained model and obtains the segmentation result output from the trained model.

 また、撮像画像60から物体認識、および撮像画像60のセグメンテーションの手法は、上記の例に限定されるものではない。例えば、変換部102は、ニューラルネットワーク以外の他の機械学習または深層学習の技術を適用して、撮像画像60からの物体認識、および撮像画像60のセグメンテーションを実行しても良い。 Further, the method of object recognition from the captured image 60 and the segmentation of the captured image 60 is not limited to the above example. For example, the conversion unit 102 may apply a machine learning or deep learning technique other than the neural network to perform object recognition from the captured image 60 and segmentation of the captured image 60.

 また、本実施形態のSLAM処理部120は、第1の実施形態の機能を備えた上で、撮像画像60のセグメンテーション結果に基づいて、自己位置の推定と地図情報の生成とを実行する。 Further, the SLAM processing unit 120 of the present embodiment has the functions of the first embodiment, and then estimates the self-position and generates map information based on the segmentation result of the captured image 60.

 例えば、SLAM処理部120は、撮像画像60のうち、常設された物体が描出された領域A1,A2に相当する3次元空間を特定し、該3次元空間を自己位置の推定処理および地図情報の生成処理の対象とする。また、SLAM処理部120は、撮像画像60のうち、常設されていない物体が描出された領域A3,A4に相当する3次元空間を特定し、該3次元空間を自己位置の推定処理および地図情報の生成処理の対象から除外する。この場合、常設されていない物体が描出された領域A3,A4を表す情報を、地図情報の生成の対象から除外する領域を表すマスク情報としてもよい。 For example, the SLAM processing unit 120 identifies a three-dimensional space corresponding to the areas A1 and A2 in which the permanent object is drawn in the captured image 60, and uses the three-dimensional space for self-position estimation processing and map information. Target of generation processing. Further, the SLAM processing unit 120 identifies a three-dimensional space corresponding to the areas A3 and A4 in which a non-permanent object is drawn in the captured image 60, and estimates the self-position and map information in the three-dimensional space. Exclude from the target of the generation process of. In this case, the information representing the areas A3 and A4 in which the non-permanent object is drawn may be used as the mask information representing the area to be excluded from the target of generating the map information.

 あるいは、SLAM処理部120は、撮像画像60のうち、常設された物体が描出された領域A1,A2をSLAM処理に使用し、撮像画像60のうち、常設されていない物体が描出された領域A3,A4をSLAM処理に使用しないものとしても良い。 Alternatively, the SLAM processing unit 120 uses the regions A1 and A2 in which the permanent object is drawn in the captured image 60 for SLAM processing, and the region A3 in which the non-permanent object is drawn in the captured image 60. , A4 may not be used for SLAM processing.

 また、各領域A1~A4に描出された物体が常設された物体であるか否かに応じて、SLAM処理の際の重み付けを変更しても良い。例えば、SLAM処理部120は、撮像画像60のうち、常設された物体が描出された領域A1,A2の重み係数が、常設されていない物体が描出された領域A3,A4の重み係数よりも大きくなるように、各領域A1~A4に重み係数を設定する。これにより、自己位置の推定と地図情報の生成において、撮像画像60のうち、常設された物体が描出された領域A1,A2の影響が、常設されていない物体が描出された領域A3,A4よりも大きくなる。 Further, the weighting at the time of SLAM processing may be changed depending on whether or not the objects drawn in each of the areas A1 to A4 are permanent objects. For example, in the SLAM processing unit 120, the weighting coefficient of the regions A1 and A2 in which the permanent object is drawn is larger than the weighting coefficient of the regions A3 and A4 in which the non-permanent object is drawn in the captured image 60. The weighting coefficients are set in each of the regions A1 to A4 so as to be. As a result, in the estimation of the self-position and the generation of the map information, the influence of the areas A1 and A2 in which the permanent object is drawn is affected by the areas A3 and A4 in which the non-permanent object is drawn. Will also grow.

 また、SLAM処理部120は、常設されていない物体が描出された領域Aにおいて、物体の種別ごとに、重み係数を変更しても良い。例えば、常設されていない物体であっても、長期間に渡って同一の位置に設置される可能性が比較的高い物体と、該可能性が比較的低い物体とがある。SLAM処理部120は、例えば、大型の什器や家具等は、移動する可能性があるため、常設されていない物体に分類されるが、人物70等と比較すると長期間に渡って同一の位置に設置される可能性が高い。このため、SLAM処理部120は、常設されていない物体が描出された領域Aのうち、移動する可能性が低い物体が描出された領域Aほど重み係数が大きくなるように、重み係数を設定しても良い。 Further, the SLAM processing unit 120 may change the weighting coefficient for each type of object in the area A in which the non-permanent object is drawn. For example, there are an object that is relatively likely to be installed at the same position for a long period of time and an object that is relatively unlikely to be installed even if the object is not permanently installed. The SLAM processing unit 120 is classified as a non-permanent object because, for example, large furniture and furniture may move, but it stays in the same position for a long period of time as compared with the person 70 and the like. It is likely to be installed. Therefore, the SLAM processing unit 120 sets the weighting coefficient so that the weighting coefficient becomes larger as the area A in which the object having a low possibility of movement is drawn out of the area A in which the non-permanent object is drawn. May be.

 なお、各領域A1~A4への重み係数は、SLAM処理部120ではなく、変換部102が設定しても良い。 The weighting coefficient for each area A1 to A4 may be set by the conversion unit 102 instead of the SLAM processing unit 120.

 このように、本実施形態の情報処理装置1は、撮像装置17によって撮像された撮像画像60を、撮像画像60に描出された物体の認識結果に基づいてセグメンテーションし、該セグメンテーションの結果に基づいて、自己位置の推定と地図情報の生成とを実行する。このため、本実施形態の情報処理装置1によれば、第1の実施形態の効果に加えて、撮像画像60に描出された物体に応じて、自己位置の推定および地図情報の生成に用いるか否か、または自己位置の推定および地図情報の生成における影響の強さを調整することができるため、自己位置の推定の精度および地図情報の精度を向上させることができる。 As described above, the information processing device 1 of the present embodiment segments the captured image 60 captured by the imaging device 17 based on the recognition result of the object drawn on the captured image 60, and based on the result of the segmentation. , Performs self-position estimation and map information generation. Therefore, according to the information processing apparatus 1 of the present embodiment, in addition to the effect of the first embodiment, whether it is used for estimating the self-position and generating map information according to the object drawn on the captured image 60. Whether or not, or the strength of the influence on the estimation of the self-position and the generation of the map information can be adjusted, so that the accuracy of the estimation of the self-position and the accuracy of the map information can be improved.

 また、上述のSLAM処理部120は、撮像画像60のセグメンテーション結果と、BIM情報等の3次元設計情報とに基づいて、自己位置の推定と地図情報の生成とを実行してもよい。 Further, the SLAM processing unit 120 described above may execute self-position estimation and map information generation based on the segmentation result of the captured image 60 and the three-dimensional design information such as BIM information.

 例えば、SLAM処理部120は、撮像画像60に描出された物体が常設物ではないと変換部102が判定した場合に、3次元設計情報を参照し、当該物体が建物9の設計に含まれているか否かを判定する。SLAM処理部120は、撮像画像60に基づいて常設物ではないと判定した物体が、3次元設計情報に登録されていない場合は、該物体が常設物ではないという判定結果をそのまま採用する。また、SLAM処理部120は、撮像画像60に基づいて常設物ではないと判定した物体が、3次元設計情報に登録されている場合は、該物体が常設物ではないという判定結果を、該物体が常設物であるという判定結果に変更する。 For example, when the conversion unit 102 determines that the object depicted in the captured image 60 is not a permanent object, the SLAM processing unit 120 refers to the three-dimensional design information, and the object is included in the design of the building 9. Judge whether or not. If the object determined not to be a permanent object based on the captured image 60 is not registered in the three-dimensional design information, the SLAM processing unit 120 adopts the determination result that the object is not a permanent object as it is. Further, when the object determined to be not a permanent object based on the captured image 60 is registered in the three-dimensional design information, the SLAM processing unit 120 determines that the object is not a permanent object. Is changed to the judgment result that is a permanent object.

 また、変換部102は、撮像画像60に描出された物体が常設物であるか否かの判定の精度の高さを、パーセント等で判定しても良い。例えば、SLAM処理部120は、撮像画像60に描出された物体が常設物であるか否かの判定の精度が基準値以下である場合に、3次元設計情報を参照し、当該物体が建物9の設計に含まれているか否かを判定してもよい。なお、判定の精度の基準値は、特に限定されるものではない。 Further, the conversion unit 102 may determine the high accuracy of determining whether or not the object depicted in the captured image 60 is a permanent object by a percentage or the like. For example, the SLAM processing unit 120 refers to the three-dimensional design information when the accuracy of determining whether or not the object depicted in the captured image 60 is a permanent object is equal to or less than the reference value, and the object is the building 9 It may be determined whether or not it is included in the design of. The reference value of the determination accuracy is not particularly limited.

 なお、3次元設計情報と画像認識結果とを比較する処理は、SLAM処理部120ではなく、変換部102が実行しても良い。 Note that the process of comparing the three-dimensional design information with the image recognition result may be executed by the conversion unit 102 instead of the SLAM processing unit 120.

 このように、学習済みモデルによって撮像画像60から認識された結果を、3次元設計情報と併用することにより、各領域Aに描出された物体が常設物であるか否かの判定結果の精度を向上させることができる。 In this way, by using the result recognized from the captured image 60 by the trained model together with the three-dimensional design information, the accuracy of the determination result as to whether or not the object drawn in each region A is a permanent object can be determined. Can be improved.

 また、SLAM処理部120は、本実施形態における撮像画像60のセグメンテーション結果と、上述の第2の実施形態における人物の入退出情報、または撮像画像における人物70の画像認識結果のいずれか1つまたは両方を組み合わせて使用しても良い。 Further, the SLAM processing unit 120 is either one of the segmentation result of the captured image 60 in the present embodiment, the entry / exit information of the person in the second embodiment described above, or the image recognition result of the person 70 in the captured image. Both may be used in combination.

(変形例1)
 上述の第1の実施形態では、BIM情報等の3次元設計情報に基づいて、SLAM処理における周囲の物体における点の3次元座標の初期値を求めていた。本変形例では、情報処理装置1の周囲が撮像された撮像画像60から推定された距離値から算出された点の3次元座標を、SLAM処理における周囲の物体における点の3次元座標の初期値として採用する。
(Modification example 1)
In the first embodiment described above, the initial value of the three-dimensional coordinates of the points in the surrounding object in the SLAM process is obtained based on the three-dimensional design information such as BIM information. In this modification, the three-dimensional coordinates of the points calculated from the distance values estimated from the captured image 60 captured around the information processing device 1 are used as the initial values of the three-dimensional coordinates of the points in the surrounding objects in the SLAM process. Adopt as.

 例えば、情報処理装置1の変換部102は、撮像画像60に基づいて、撮像画像60に描出された物体と撮像装置17との間の距離(深度)を推定する。該推定処理を、デプス推定処理という。 For example, the conversion unit 102 of the information processing device 1 estimates the distance (depth) between the object drawn on the captured image 60 and the imaging device 17 based on the captured image 60. The estimation process is called a depth estimation process.

 本変形例においては、環境情報は、少なくとも撮像画像60を含むものとする。あるいは、撮像画像60自体ではなく、撮像画像60から推定した距離情報を、環境情報の一例としても良い。 In this modification, the environmental information includes at least the captured image 60. Alternatively, the distance information estimated from the captured image 60 may be used as an example of the environmental information instead of the captured image 60 itself.

 例えば、上述の第1の実施形態で説明したように、撮像装置17がステレオカメラである場合は、変換部102は、該ステレオカメラに含まれる1つのカメラで撮像された撮像画像のステレオ視差から、深度を算出する。 For example, as described in the first embodiment described above, when the image pickup device 17 is a stereo camera, the conversion unit 102 is based on the stereoscopic difference of the captured image captured by one camera included in the stereo camera. , Calculate the depth.

 また、撮像装置17は、単眼カメラでも良い。この場合、変換部102は、機械学習または深層学習の技術を使用して、デプス推定処理を実行する。例えば、変換部102は、単眼カメラで撮像された撮像画像60を入力すると該撮像画像60に対応する深度マップを出力する学習済みモデルにより、撮像画像60に描出された物体と撮像装置17との間の距離を推定してもよい。 Further, the imaging device 17 may be a monocular camera. In this case, the conversion unit 102 executes the depth estimation process by using the technique of machine learning or deep learning. For example, when the conversion unit 102 inputs the captured image 60 captured by the monocular camera, the conversion unit 102 outputs the depth map corresponding to the captured image 60. The distance between them may be estimated.

 本変形例における学習済みモデルは、例えば、ステレオ画像を学習データとして、単眼画像から該単眼画像と対になる画像を推定することにより、単眼画像から深度を推定するモデルとする。なお、単眼画像から深度を推定する手法はこれに限定されるものではない。 The trained model in this modification is, for example, a model that estimates the depth from a monocular image by estimating an image paired with the monocular image from the monocular image using a stereo image as training data. The method of estimating the depth from a monocular image is not limited to this.

 変換部102は、撮像画像60から推定した距離を、SLAM処理の入力値としてSLAM処理部120に送出する。より詳細には、変換部102は、撮像画像60から推定した距離から推定される点の3次元座標をバンドル調整部104によるバンドル調整処理における初期値とする。なお、変換部102は、初期値を一意の値として特定せずに、初期値の範囲を特定してもよい。 The conversion unit 102 sends the distance estimated from the captured image 60 to the SLAM processing unit 120 as an input value for SLAM processing. More specifically, the conversion unit 102 sets the three-dimensional coordinates of the points estimated from the distance estimated from the captured image 60 as the initial values in the bundle adjustment process by the bundle adjustment unit 104. The conversion unit 102 may specify the range of the initial value without specifying the initial value as a unique value.

 なお、撮像装置17が情報処理装置1の中心から離れた位置に設置されている場合には、変換部102またはSLAM処理部120は、撮像画像60から推定した距離を、撮像装置17と情報処理装置1の中心との位置のずれに基づいて補正する。 When the image pickup device 17 is installed at a position away from the center of the information processing device 1, the conversion unit 102 or the SLAM processing unit 120 processes the distance estimated from the captured image 60 with the image pickup device 17. The correction is made based on the deviation of the position from the center of the device 1.

 本変形例によれば、撮像画像60から推定した物体と撮像装置17との間の距離を、SALM処理の入力値として使用することにより、3次元設計情報がなくとも、バンドル調整処理等の演算量を低減することができる。 According to this modification, by using the distance between the object estimated from the captured image 60 and the imaging device 17 as the input value of the SALM process, the bundle adjustment process and the like can be calculated even if there is no three-dimensional design information. The amount can be reduced.

 また、変換部102は、撮像画像60から推定した物体と撮像装置17との間の距離と、3次元設計情報から算出した情報処理装置1と物体との距離との両方に基づいて、周囲の物体までの距離に関する入力値を生成してもよい。例えば、変換部102は、撮像画像60から推定した距離と3次元設計情報から算出した距離の平均から求めた点の3次元座標を、バンドル調整処理における初期値としてもよい。 Further, the conversion unit 102 is based on both the distance between the object and the image pickup device 17 estimated from the captured image 60 and the distance between the information processing device 1 and the object calculated from the three-dimensional design information. You may generate an input value for the distance to the object. For example, the conversion unit 102 may use the three-dimensional coordinates of the point obtained from the average of the distance estimated from the captured image 60 and the distance calculated from the three-dimensional design information as the initial value in the bundle adjustment process.

(変形例2)
 上述の第1から第4の実施形態では、SLAM処理部120は、点群地図を生成するものとしたが、3次元表現の態様は、点群地図に限定されるものではない。
(Modification 2)
In the first to fourth embodiments described above, the SLAM processing unit 120 generates a point cloud map, but the mode of the three-dimensional representation is not limited to the point cloud map.

 例えば、SLAM処理部120,1120,2120(以下、代表してSLAM処理部120と記載する)は、3次元座標をもつ複数の図形の集合を、地図情報として生成してもよい。 For example, the SLAM processing units 120, 1120, and 2120 (hereinafter, collectively referred to as the SLAM processing unit 120) may generate a set of a plurality of figures having three-dimensional coordinates as map information.

 図13は、変形例2に係る地図情報の一例を示す図である。図13に示す地図情報500は、2次元の撮像画像45に対して、複数の三角形の図形(三角パッチ、Triangular-patch-cloud)501a~501f(以下、三角パッチ501という)が当てはめられたものである。 FIG. 13 is a diagram showing an example of map information according to the modified example 2. The map information 500 shown in FIG. 13 is obtained by applying a plurality of triangular figures (triangular-patch-cloud) 501a to 501f (hereinafter referred to as triangular patch 501) to the two-dimensional captured image 45. Is.

 個々の三角パッチ501は平面図形であるが、3次元空間上で位置および向きを変更可能であるものとする。三角パッチ501の向きは、法線ベクトルnによって表される。また、三角パッチ501の位置は、3次元座標で表される。各三角パッチ501の位置および向きは、2次元の撮像画像45の深度に対応する。 Each triangular patch 501 is a plane figure, but its position and orientation can be changed in three-dimensional space. The orientation of the triangular patch 501 is represented by the normal vector n. The position of the triangular patch 501 is represented by three-dimensional coordinates. The position and orientation of each triangular patch 501 corresponds to the depth of the two-dimensional captured image 45.

 SLAM処理部120は、撮像画像45に当てはめた複数の三角パッチ501の中心点の位置と法線ベクトルとを最適化することにより、3次元の地図情報を生成する。 The SLAM processing unit 120 generates three-dimensional map information by optimizing the positions of the center points and the normal vectors of the plurality of triangular patches 501 applied to the captured image 45.

 本変形例の情報処理装置1は、このように三角パッチ501の集合として地図情報を生成することにより、3次元空間上の点の3次元座標を個々に算出する場合よりも、計算量を低減すると共に、周囲の環境を密に表現する地図情報を生成することができる。 By generating map information as a set of triangular patches 501 in this way, the information processing device 1 of this modification reduces the amount of calculation as compared with the case where the three-dimensional coordinates of points in the three-dimensional space are individually calculated. At the same time, it is possible to generate map information that closely expresses the surrounding environment.

 また、図13では撮像画像45に対して三角パッチ501が当てはめられるものとしたが、BIM情報に対して、三角パッチ501が当てはめられてもよい。例えば、変換部102,1102,2102(以下、変換部102と記載する)は、BIM情報等の3次元設計情報に、三角パッチ501を当てはめてもよい。例えば、変換部102は、撮像画像上でエッジとして描出される境界以外にも、BIM情報に基づいて、3次元構造の境界に三角パッチ501の境界を定めることができる。 Further, in FIG. 13, the triangular patch 501 is applied to the captured image 45, but the triangular patch 501 may be applied to the BIM information. For example, the conversion unit 102, 1102, 2102 (hereinafter, referred to as the conversion unit 102) may apply the triangular patch 501 to the three-dimensional design information such as BIM information. For example, the conversion unit 102 can determine the boundary of the triangular patch 501 as the boundary of the three-dimensional structure based on the BIM information, in addition to the boundary drawn as an edge on the captured image.

 当該構成を採用する場合、SLAM処理部120は、変換部102によって当てはめられた複数の三角パッチ501の位置及び向きを、SLAM結果に基づいて補正することにより、より高精度な地図情報を生成することができる。 When adopting this configuration, the SLAM processing unit 120 generates more accurate map information by correcting the positions and orientations of the plurality of triangular patches 501 applied by the conversion unit 102 based on the SLAM result. be able to.

 なお、地図情報を構成する図形は、三角パッチ501に限定されるものではなく、SLAM処理部120は、メッシュ表現や、3次元のポリゴンによって地図情報を生成してもよい。 Note that the figures constituting the map information are not limited to the triangular patch 501, and the SLAM processing unit 120 may generate the map information by mesh representation or three-dimensional polygons.

(変形例3)
 上述の第1から第3の実施形態では、環境情報は、3次元設計情報、建物9における人物の入退出情報、または撮像画像における人物の画像認識結果としたが、環境情報はこれらに限定されるものではない。
(Modification example 3)
In the first to third embodiments described above, the environmental information is the three-dimensional design information, the entry / exit information of the person in the building 9, or the image recognition result of the person in the captured image, but the environmental information is limited to these. It's not something.

 例えば、本変形例においては、環境情報は、少なくとも周囲の照明または天気のいずれか1つに関する情報を含む。周囲の照明に関する情報は、例えば、建物9の部屋ごとまたはフロアごとの照明がオン状態であるか、オフ状態であるかを表す情報である。また、天気に関する情報は、建物9を含む地域における晴れ、曇り、雨等、日照条件に関する情報である。環境情報は、周囲の照明に関する情報と天気に関する情報の両方を含むものでもよいし、いずれか一方のみを含むものでもよい。 For example, in this modification, the environmental information includes information on at least one of the ambient lighting and the weather. The information regarding the ambient lighting is, for example, information indicating whether the lighting for each room or floor of the building 9 is on or off. The information on the weather is information on sunshine conditions such as sunny, cloudy, and rainy in the area including the building 9. The environmental information may include both information on ambient lighting and information on weather, or may include only one or the other.

 例えば、取得部101,1101(以下、取得部101と記載する)は、外部装置2から、周囲の照明または天気に関する情報を取得する。 For example, the acquisition units 101 and 1101 (hereinafter referred to as the acquisition unit 101) acquire information on ambient lighting or weather from the external device 2.

 また、変換部102は、取得部101によって取得された周囲の照明または天気に関する情報に基づいて、撮像画像が劣化する可能性が高い領域を表すマスク情報を生成する。なお、第2の実施形態のマスク情報を第1のマスク情報、本変形例のマスク情報を第2のマスク情報として区別してもよい。 Further, the conversion unit 102 generates mask information representing a region where the captured image is likely to be deteriorated, based on the information on the ambient lighting or the weather acquired by the acquisition unit 101. The mask information of the second embodiment may be distinguished as the first mask information, and the mask information of the present modification may be distinguished as the second mask information.

 本変形例のSLAM処理部120は、マスク情報に相当する領域においては、少なくとも、撮像画像を自己位置の推定または地図情報の生成のいずれかには使用しない。例えば、SLAM処理部120は、マスク情報に相当する領域において、撮像画像を自己位置の推定処理と地図情報の生成処理の両方に使用しないものとしても良いし、いずれか一方の処理にのみ使用しないものとしても良い。例えば、SLAM処理部120は、マスク情報に相当する領域において、撮像画像を移動のための自己位置の推定処理に使用したとしても、地図情報の生成には使用しないものとしても良い。 The SLAM processing unit 120 of this modification does not use the captured image at least for either self-position estimation or map information generation in the region corresponding to the mask information. For example, the SLAM processing unit 120 may not use the captured image for both the self-position estimation process and the map information generation process in the region corresponding to the mask information, or may not use it for only one of the processes. It may be a thing. For example, the SLAM processing unit 120 may use the captured image for self-position estimation processing for movement in a region corresponding to mask information, or may not use it for generating map information.

 具体的には、照明または日照条件によって撮像画像に白飛び領域、または黒つぶれ領域が発生する場合がある。このような領域を使用すると、自己位置推定または地図情報の精度が低下する場合がある。本変形例では、このような事象が発生する可能性のある領域においては撮像画像を自己位置の推定または地図情報の生成に使用しないことにより、自己位置推定または地図情報の精度の低下を低減する。 Specifically, overexposed areas or underexposed areas may occur in the captured image depending on the lighting or sunshine conditions. The use of such areas may reduce the accuracy of self-position estimation or map information. In this modified example, the deterioration of the accuracy of the self-position estimation or the map information is reduced by not using the captured image for the self-position estimation or the generation of the map information in the region where such an event may occur. ..

 なお、SLAM処理部120は、マスク情報に相当する領域において、撮像画像を自己位置の推定または地図情報の生成に全く使用しないのではなく、優先度を下げて使用するものとしてもよい。例えば、情報処理装置1が、撮像装置17以外に周囲の状態を検知するセンサ等を備える場合には、SLAM処理部120は、マスク情報に相当する領域については、当該センサ等による検知結果を、撮像画像よりも優先して自己位置の推定または地図情報の生成に使用する。 Note that the SLAM processing unit 120 may not use the captured image at all for estimating its own position or generating map information in the region corresponding to the mask information, but may use it with a lower priority. For example, when the information processing device 1 includes a sensor or the like for detecting the surrounding state in addition to the image pickup device 17, the SLAM processing unit 120 displays the detection result by the sensor or the like for the area corresponding to the mask information. It is used to estimate the self-position or generate map information in preference to the captured image.

(変形例4)
 また、変換部102は、環境情報に基づいて、撮像画像の階調を変更してもよい。例えば、変換部102は、周囲の照明または天気に関する情報に基づいて、撮像画像のダイナミックレンジを変更することにより、白とびまたは黒つぶれを低減する。
(Modification example 4)
Further, the conversion unit 102 may change the gradation of the captured image based on the environmental information. For example, the conversion unit 102 reduces overexposure or underexposure by changing the dynamic range of the captured image based on information about ambient lighting or weather.

 この場合、SLAM処理部120は、変換部102によって諧調が変更された撮像画像に基づいて、自己位置の推定と地図情報の生成とを実行する。 In this case, the SLAM processing unit 120 executes self-position estimation and map information generation based on the captured image whose gradation has been changed by the conversion unit 102.

 このため、本変形例の情報処理装置1によれば、照明条件または日照条件等の周囲の環境に対応して、ロバストに自己位置の推定と地図情報の生成とを実行することができる。 Therefore, according to the information processing device 1 of this modification, it is possible to robustly estimate the self-position and generate map information in response to the surrounding environment such as lighting conditions or sunshine conditions.

(変形例5)
 また、環境情報は、3次元設計情報と、建物9の建設工程を表す工程情報とを含むものとしてもよい。
(Modification 5)
Further, the environmental information may include three-dimensional design information and process information representing the construction process of the building 9.

 本変形例における工程情報は、建物9の建築のスケジュール、またはタイムライン(予定表)を表す情報である。建物9が点築途中である場合、BIM情報等の3次元設計情報と、工程情報とを照合すると、建物9の建築が完了している領域と、建築途中の領域とが区別可能になる。3次元設計情報は、基本的には、建築が完了した状態における建物9の3次元モデルを表すため、建築途中の領域では、3次元設計情報と現実の建物9の状態とに差異がある可能性が高い。 The process information in this modification is information representing the construction schedule or timeline (schedule) of the building 9. When the building 9 is in the process of being built, by collating the three-dimensional design information such as BIM information with the process information, it becomes possible to distinguish between the area where the building 9 has been constructed and the area where the building 9 is in the process of being constructed. Since the 3D design information basically represents a 3D model of the building 9 in the completed state of construction, there may be a difference between the 3D design information and the actual state of the building 9 in the area in the middle of construction. Highly sexual.

 本変形例の変換部102は、3次元設計情報と、工程情報とに基づいて、建物9のうち、建築が完了していない領域を表す未完成領域情報を生成する。 The conversion unit 102 of this modification generates unfinished area information representing an area of the building 9 in which the construction has not been completed, based on the three-dimensional design information and the process information.

 本変形例のSLAM処理部120は、未完成領域情報に相当する領域については、3次元設計情報を使用せずに、自己位置の推定と地図情報の生成とを実行する。例えば、SLAM処理部120は、未完成領域情報に相当する領域については、撮像画像またはセンサ等の検知結果に基づいて、自己位置の推定と地図情報の生成とを実行する。 The SLAM processing unit 120 of this modified example executes self-position estimation and map information generation for the area corresponding to the unfinished area information without using the three-dimensional design information. For example, the SLAM processing unit 120 executes self-position estimation and map information generation based on the detection result of the captured image or the sensor for the region corresponding to the unfinished region information.

 このため、本変形例の情報処理装置1は、3次元設計情報と現実の建物9の状態とに差異がある可能性が高い領域については、3次元設計情報を使用しないことにより、建物9が建築途中である場合にも、自己位置の推定と地図情報の精度が低下することを低減する。 Therefore, the information processing device 1 of this modified example does not use the three-dimensional design information in the region where there is a high possibility that there is a difference between the three-dimensional design information and the actual state of the building 9, so that the building 9 can be used. Even in the middle of construction, it is possible to reduce the decrease in the accuracy of self-position estimation and map information.

 なお、SLAM処理部120は、未完成領域情報に相当する領域において、3次元設計情報を全く使用しないのではなく、優先度を下げて使用するものとしてもよい。 Note that the SLAM processing unit 120 may use the three-dimensional design information at a lower priority in the area corresponding to the unfinished area information, instead of not using the three-dimensional design information at all.

(変形例6)
 また、変形例5で説明した未完成領域情報の用途は、上述の例に限定されるものではない。
(Modification 6)
Further, the use of the unfinished area information described in the modified example 5 is not limited to the above-mentioned example.

 例えば、SLAM処理部120は、建物9の地図情報を生成する場合に、未完成領域情報に対応する領域のみ地図情報を生成するものとしてもよい。つまり、SLAM処理部120は、建築が完了した領域については、建物9の構造が変化しないものと推定し、建物9の構造が変化する領域、つまり未完成領域情報に対応する領域についてのみ地図情報を生成することで演算量を低減する。 For example, when the SLAM processing unit 120 generates the map information of the building 9, the map information may be generated only in the area corresponding to the unfinished area information. That is, the SLAM processing unit 120 estimates that the structure of the building 9 does not change in the area where the construction is completed, and the map information is only in the area where the structure of the building 9 changes, that is, the area corresponding to the unfinished area information. The amount of calculation is reduced by generating.

 また、SLAM処理部120のトラッキング部103は、トラッキング処理において撮像装置17の位置および姿勢を求める場合には、未完成領域情報に対応する領域以外の領域を撮像した撮像画像を使用してトラッキング処理を行うものとしてもよい。これは、未完成領域情報に対応する領域については、建築作業によって被写体である構造物が変化するため、異なる時点で撮像された撮像画像間で、点50の追跡をすることが困難な場合があるからである。 Further, when the tracking unit 103 of the SLAM processing unit 120 obtains the position and orientation of the image pickup device 17 in the tracking process, the tracking unit 103 uses the captured image obtained by capturing the area other than the area corresponding to the unfinished area information in the tracking process. May be done. This is because, for the area corresponding to the unfinished area information, the structure that is the subject changes depending on the construction work, so it may be difficult to track the point 50 between the captured images captured at different times. Because there is.

(変形例7)
 また、上述の第1から第3の実施形態では、情報処理装置1は、建物9内を移動中に、現在の自己位置の推定処理および地図情報の生成処理をリアルタイムに実行する例を説明したが、自己位置の推定処理および地図情報の生成処理の実行タイミングはこれに限定されるものではない。例えば、情報処理装置1は、移動の終了後に、移動中に検知した情報処理装置1の周囲の状態または情報処理装置1の状態の検知結果に基づいて、自己位置の推定処理または地図情報の生成処理を実行しても良い。
(Modification 7)
Further, in the first to third embodiments described above, an example has been described in which the information processing device 1 executes the current self-position estimation process and the map information generation process in real time while moving in the building 9. However, the execution timing of the self-position estimation process and the map information generation process is not limited to this. For example, after the movement is completed, the information processing device 1 estimates its own position or generates map information based on the detection result of the surrounding state of the information processing device 1 or the state of the information processing device 1 detected during the movement. You may execute the process.

(変形例8)
 また、上述の第1から第3の実施形態では、情報処理装置1が、自己位置の推定処理および地図情報の生成処理を実行するものして説明したが、外部装置2が、自己位置の推定処理および地図情報の生成処理を実行する構成を採用しても良い。例えば、外部装置2は、情報処理装置1から取得した検知結果と、環境情報とに基づいて、情報処理装置1の位置の推定処理と、地図情報の生成処理と実行しても良い。この場合、外部装置2を情報処理装置の一例としても良い。
(Modification 8)
Further, in the first to third embodiments described above, the information processing device 1 executes the self-position estimation process and the map information generation process, but the external device 2 estimates the self-position. A configuration that executes processing and generation processing of map information may be adopted. For example, the external device 2 may execute the position estimation process of the information processing device 1 and the map information generation process based on the detection result acquired from the information processing device 1 and the environmental information. In this case, the external device 2 may be used as an example of the information processing device.

 なお、本明細書(請求項を含む)において、「a、b及びcの少なくとも1つ(一方)」又は「a、b又はcの少なくとも1つ(一方)」の表現(同様な表現を含む)が用いられる場合は、a、b、c、a-b、a-c、b-c、又はa-b-cのいずれかを含む。また、a-a、a-b-b、a-a-b-b-c-c等のように、いずれかの要素について複数のインスタンスを含んでもよい。さらに、a-b-c-dのようにdを有する等、列挙された要素(a、b及びc)以外の他の要素を加えることも含む。 In the present specification (including claims), the expression "at least one of a, b and c (one)" or "at least one of a, b or c (one)" (including similar expressions). ) Is used, it includes any of a, b, c, ab, ac, bc, or abc. It may also include multiple instances of any element, such as a-a, a-b-b, a-a-b-b-c-c, and the like. It also includes adding elements other than the listed elements (a, b and c), such as having d, such as a-b-c-d.

 本明細書(請求項を含む)において、「データを入力として/データに基づいて/に従って/に応じて」等の表現(同様な表現を含む)が用いられる場合は、特に断りがない場合、各種データそのものを入力として用いる場合や、各種データに何らかの処理を行ったもの(例えば、ノイズ加算したもの、正規化したもの、各種データの中間表現等)を入力として用いる場合を含む。また「データに基づいて/に従って/に応じて」何らかの結果が得られる旨が記載されている場合、当該データのみに基づいて当該結果が得られる場合を含むとともに、当該データ以外の他のデータ、要因、条件、及び/又は状態等にも影響を受けて当該結果が得られる場合をも含み得る。また、「データを出力する」旨が記載されている場合、特に断りがない場合、各種データそのものを出力として用いる場合や、各種データに何らかの処理を行ったもの(例えば、ノイズ加算したもの、正規化したもの、各種データの中間表現等)を出力とする場合も含む。 In the present specification (including claims), when expressions such as "with data as input / based on / according to / according to" (including similar expressions) are used, unless otherwise specified. This includes the case where various data itself is used as an input, and the case where various data are processed in some way (for example, noise-added data, normalized data, intermediate representation of various data, etc.) are used as input. In addition, when it is stated that some result can be obtained "based on / according to / according to the data", it includes the case where the result can be obtained based only on the data, and other data other than the data. It may also include cases where the result is obtained under the influence of factors, conditions, and / or conditions. In addition, when it is stated that "data is output", unless otherwise specified, various data itself is used as output, or various data is processed in some way (for example, noise is added, normal). It also includes the case where the output is output (intermediate representation of various data, etc.).

 本明細書(請求項を含む)において、「接続される(connected)」及び「結合される(coupled)」との用語が用いられる場合は、直接的な接続/結合、間接的な接続/結合、電気的(electrically)な接続/結合、通信的(communicatively)な接続/結合、機能的(operatively)な接続/結合、物理的(physically)な接続/結合等のいずれをも含む非限定的な用語として意図される。当該用語は、当該用語が用いられた文脈に応じて適宜解釈されるべきであるが、意図的に或いは当然に排除されるのではない接続/結合形態は、当該用語に含まれるものして非限定的に解釈されるべきである。 In the present specification (including claims), when the terms "connected" and "coupled" are used, direct connection / coupling and indirect connection / coupling are used. , Electrically connected / combined, communicatively connected / combined, operatively connected / combined, physically connected / combined, etc. Intended as a term. The term should be interpreted as appropriate according to the context in which the term is used, but any connection / combination form that is not intentionally or naturally excluded is not included in the term. It should be interpreted in a limited way.

 本明細書(請求項を含む)において、「AがBするよう構成される(A configured to B)」との表現が用いられる場合は、要素Aの物理的構造が、動作Bを実行可能な構成を有するとともに、要素Aの恒常的(permanent)又は一時的(temporary)な設定(setting/configuration)が、動作Bを実際に実行するように設定(configured/set)されていることを含んでよい。例えば、要素Aが汎用プロセッサである場合、当該プロセッサが動作Bを実行可能なハードウェア構成を有するとともに、恒常的(permanent)又は一時的(temporary)なプログラム(命令)の設定により、動作Bを実際に実行するように設定(configured)されていればよい。また、要素Aが専用プロセッサ又は専用演算回路等である場合、制御用命令及びデータが実際に付属しているか否かとは無関係に、当該プロセッサの回路的構造が動作Bを実際に実行するように構築(implemented)されていればよい。 When the expression "A configured to B" is used in the present specification (including claims), the physical structure of the element A can execute the operation B. Including that the element A has a configuration and the permanent or temporary setting (setting / configuration) of the element A is set (configured / set) to actually execute the operation B. good. For example, when the element A is a general-purpose processor, the processor has a hardware configuration capable of executing the operation B, and the operation B is set by setting a permanent or temporary program (instruction). It suffices if it is configured to actually execute. Further, when the element A is a dedicated processor, a dedicated arithmetic circuit, or the like, the circuit structure of the processor actually executes the operation B regardless of whether or not the control instruction and data are actually attached. It only needs to be implemented.

 本明細書(請求項を含む)において、含有又は所有を意味する用語(例えば、「含む(comprising/including)」及び有する「(having)等)」が用いられる場合は、当該用語の目的語により示される対象物以外の物を含有又は所有する場合を含む、open-endedな用語として意図される。これらの含有又は所有を意味する用語の目的語が数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)である場合は、当該表現は特定の数に限定されないものとして解釈されるべきである。 In the present specification (including claims), when a term meaning inclusion or possession (for example, "comprising / including" and "having", etc.) is used, the object of the term is used. It is intended as an open-ended term, including the case of containing or owning an object other than the indicated object. If the object of these terms that mean inclusion or possession is an expression that does not specify a quantity or suggests a singular (an expression with a or an as an article), the expression is interpreted as not being limited to a specific number. It should be.

 本明細書(請求項を含む)において、ある箇所において「1つ又は複数(one or more)」又は「少なくとも1つ(at least one)」等の表現が用いられ、他の箇所において数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)が用いられているとしても、後者の表現が「1つ」を意味することを意図しない。一般に、数量を指定しない又は単数を示唆する表現(a又はanを冠詞とする表現)は、必ずしも特定の数に限定されないものとして解釈されるべきである。 In this specification (including claims), expressions such as "one or more" or "at least one" are used in some places, and the quantity is specified in other places. Even if expressions that do not or suggest the singular (expressions with a or an as an article) are used, the latter expression is not intended to mean "one". In general, expressions that do not specify a quantity or suggest a singular (expressions with a or an as an article) should be interpreted as not necessarily limited to a particular number.

 本明細書において、ある実施例の有する特定の構成について特定の効果(advantage/result)が得られる旨が記載されている場合、別段の理由がない限り、当該構成を有する他の1つ又は複数の実施例についても当該効果が得られると理解されるべきである。但し当該効果の有無は、一般に種々の要因、条件、及び/又は状態等に依存し、当該構成により必ず当該効果が得られるものではないと理解されるべきである。当該効果は、種々の要因、条件、及び/又は状態等が満たされたときに実施例に記載の当該構成により得られるものに過ぎず、当該構成又は類似の構成を規定したクレームに係る発明において、当該効果が必ずしも得られるものではない。 In the present specification, when it is stated that a specific effect (advantage / result) can be obtained for a specific configuration having an embodiment, unless there is a specific reason, another one or more having the configuration. It should be understood that the effect can also be obtained in the examples of. However, it should be understood that the presence or absence of the effect generally depends on various factors, conditions, and / or states, etc., and that the effect cannot always be obtained by the configuration. The effect is merely obtained by the configuration described in the examples when various factors, conditions, and / or conditions are satisfied, and in the invention relating to the claim that defines the configuration or a similar configuration. , The effect is not always obtained.

 本明細書(請求項を含む)において、「最大化(maximize)」等の用語が用いられる場合は、グローバルな最大値を求めること、グローバルな最大値の近似値を求めること、ローカルな最大値を求めること、及びローカルな最大値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最大値の近似値を確率的又はヒューリスティックに求めることを含む。同様に、「最小化(minimize)」等の用語が用いられる場合は、グローバルな最小値を求めること、グローバルな最小値の近似値を求めること、ローカルな最小値を求めること、及びローカルな最小値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最小値の近似値を確率的又はヒューリスティックに求めることを含む。同様に、「最適化(optimize)」等の用語が用いられる場合は、グローバルな最適値を求めること、グローバルな最適値の近似値を求めること、ローカルな最適値を求めること、及びローカルな最適値の近似値を求めることを含み、当該用語が用いられた文脈に応じて適宜解釈されるべきである。また、これら最適値の近似値を確率的又はヒューリスティックに求めることを含む。 In the present specification (including claims), when terms such as "maximize" are used, the global maximum value is obtained, the approximate value of the global maximum value is obtained, and the local maximum value is obtained. Should be interpreted as appropriate according to the context in which the term is used, including finding an approximation of the local maximum. It also includes probabilistically or heuristically finding approximate values of these maximum values. Similarly, when terms such as "minimize" are used, find the global minimum, find the approximation of the global minimum, find the local minimum, and find the local minimum. It should be interpreted as appropriate according to the context in which the term was used, including finding an approximation of the value. It also includes probabilistically or heuristically finding approximate values of these minimum values. Similarly, when terms such as "optimize" are used, finding a global optimal value, finding an approximation of a global optimal value, finding a local optimal value, and local optimization It should be interpreted as appropriate according to the context in which the term was used, including finding an approximation of the value. It also includes probabilistically or heuristically finding approximate values of these optimal values.

 本明細書(請求項を含む)において、複数のハードウェアが所定の処理を行う場合、各ハードウェアが協働して所定の処理を行ってもよいし、一部のハードウェアが所定の処理の全てを行ってもよい。また、一部のハードウェアが所定の処理の一部を行い、別のハードウェアが所定の処理の残りを行ってもよい。本明細書(請求項を含む)において、「1又は複数のハードウェアが第1の処理を行い、前記1又は複数のハードウェアが第2の処理を行う」等の表現が用いられている場合、第1の処理を行うハードウェアと第2の処理を行うハードウェアは同じものであってもよいし、異なるものであってもよい。つまり、第1の処理を行うハードウェア及び第2の処理を行うハードウェアが、前記1又は複数のハードウェアに含まれていればよい。なお、ハードウェアは、電子回路、又は電子回路を含む装置等を含んでよい。 In the present specification (including claims), when a plurality of hardware performs a predetermined process, the respective hardware may cooperate to perform the predetermined process, or some hardware may perform the predetermined process. You may do all of the above. Further, some hardware may perform a part of a predetermined process, and another hardware may perform the rest of the predetermined process. In the present specification (including claims), when expressions such as "one or more hardware performs the first process and the one or more hardware performs the second process" are used. , The hardware that performs the first process and the hardware that performs the second process may be the same or different. That is, the hardware that performs the first process and the hardware that performs the second process may be included in the one or more hardware. The hardware may include an electronic circuit, a device including the electronic circuit, or the like.

 本明細書(請求項を含む)において、複数の記憶装置(メモリ)がデータの記憶を行う場合、複数の記憶装置(メモリ)のうち個々の記憶装置(メモリ)は、データの一部のみを記憶してもよいし、データの全体を記憶してもよい。 In the present specification (including claims), when a plurality of storage devices (memory) store data, each storage device (memory) among the plurality of storage devices (memory) stores only a part of the data. It may be stored or the entire data may be stored.

 以上説明したとおり、第1から第3の実施形態によれば、自己位置の推定および地図情報の精度を向上させることができる。 As described above, according to the first to third embodiments, it is possible to improve the estimation of the self-position and the accuracy of the map information.

 以上、本開示の実施形態について詳述したが、本開示は上記した個々の実施形態に限定されるものではない。請求の範囲に規定された内容及びその均等物から導き出される本発明の概念的な思想と趣旨を逸脱しない範囲において種々の追加、変更、置き換え及び部分的削除等が可能である。例えば、前述した全ての実施形態において、数値又は数式を説明に用いている場合は、一例として示したものであり、これらに限られるものではない。また、実施形態における各動作の順序は、一例として示したものであり、これらに限られるものではない。 Although the embodiments of the present disclosure have been described in detail above, the present disclosure is not limited to the individual embodiments described above. Various additions, changes, replacements, partial deletions, etc. are possible without departing from the conceptual idea and purpose of the present invention derived from the contents specified in the claims and their equivalents. For example, in all the above-described embodiments, when numerical values or mathematical formulas are used for explanation, they are shown as examples, and the present invention is not limited thereto. Further, the order of each operation in the embodiment is shown as an example, and is not limited to these.

Claims (17)

 少なくとも1つのメモリと、
 少なくとも1つのプロセッサと、を備え、
 前記少なくとも1つのプロセッサは、
  情報処理装置の周囲の状態または前記情報処理装置の状態のいずれか1つを含む検知結果と、前記情報処理装置の周囲の環境に関する環境情報と、を取得することと、
  前記環境情報と前記検知結果とに基づいて、自己位置の推定と地図情報の生成とを実行することと、
 を実行可能に構成される、
 情報処理装置。
With at least one memory
With at least one processor,
The at least one processor
Acquiring the detection result including any one of the surrounding state of the information processing device and the state of the information processing device, and the environmental information regarding the environment around the information processing device.
Performing self-position estimation and map information generation based on the environmental information and the detection result, and
Is configured to be executable,
Information processing device.
 前記環境情報は、建物の3次元設計情報を含む、
 請求項1に記載の情報処理装置。
The environmental information includes three-dimensional design information of the building.
The information processing device according to claim 1.
 前記少なくとも1つのプロセッサは、
  前記3次元設計情報に基づいて、周囲の物体の位置に関する情報を生成し、
  前記周囲の物体の位置に関する情報に基づいて、前記自己位置の推定と前記地図情報の生成とを実行する、
 請求項2に記載の情報処理装置。
The at least one processor
Based on the 3D design information, information about the position of surrounding objects is generated.
Based on the information about the position of the surrounding object, the estimation of the self-position and the generation of the map information are executed.
The information processing device according to claim 2.
 前記検知結果は、前記情報処理装置に搭載された撮像装置によって撮像された複数の撮像画像を含み、
 前記少なくとも1つのプロセッサは、
  前記複数の撮像画像に基づいて、前記撮像装置の位置および姿勢を推定し、
  前記3次元設計情報と前記撮像装置の位置および姿勢とに基づいて、前記周囲の物体の位置に関する情報を生成する、
 請求項3に記載の情報処理装置。
The detection result includes a plurality of captured images captured by an imaging device mounted on the information processing device.
The at least one processor
Based on the plurality of captured images, the position and orientation of the imaging device are estimated, and the position and orientation of the imaging device are estimated.
Based on the three-dimensional design information and the position and orientation of the imaging device, information regarding the position of the surrounding object is generated.
The information processing device according to claim 3.
 前記少なくとも1つのプロセッサは、
  前記3次元設計情報に基づいて、周囲の物体の平面または曲面の位置を推定し、
  周囲に存在する複数の点が前記平面上または前記曲面上に位置するという制約条件に基づいて、前記周囲の物体の位置に関する情報を生成する、
 請求項4に記載の情報処理装置。
The at least one processor
Based on the 3D design information, the position of the plane or curved surface of the surrounding object is estimated.
Generates information about the position of the surrounding object based on the constraint that a plurality of points existing in the surroundings are located on the plane or the curved surface.
The information processing device according to claim 4.
 前記少なくとも1つのプロセッサは、
  前記検知結果から、前記3次元設計情報における座標系で位置が表された指標情報を検出し、
  前記指標情報の検出結果に基づいて、前記自己位置を表す座標系を、前記3次元設計情報の座標系と整合するように調整する、
 請求項2から5のいずれか1項に記載の情報処理装置。
The at least one processor
From the detection result, the index information whose position is represented by the coordinate system in the three-dimensional design information is detected.
Based on the detection result of the index information, the coordinate system representing the self-position is adjusted so as to match the coordinate system of the three-dimensional design information.
The information processing device according to any one of claims 2 to 5.
 前記少なくとも1つのプロセッサは、
  前記3次元設計情報に、3次元座標をもつ複数の図形の集合を当てはめ、
  前記自己位置の推定結果に基づいて前記複数の図形の位置及び向きを補正し、
  位置及び向きを補正した前記複数の図形の集合を、前記地図情報として生成する、
 請求項2から6のいずれか1項に記載の情報処理装置。
The at least one processor
Applying a set of a plurality of figures having three-dimensional coordinates to the three-dimensional design information,
Based on the estimation result of the self-position, the positions and orientations of the plurality of figures are corrected.
A set of the plurality of figures whose positions and orientations have been corrected is generated as the map information.
The information processing device according to any one of claims 2 to 6.
 前記環境情報は、前記3次元設計情報と、前記建物の建設工程を表す工程情報とを含み、
 前記少なくとも1つのプロセッサは、
  前記3次元設計情報と前記工程情報とに基づいて、前記建物のうち、建築が完了していない領域を表す未完成領域情報を生成し、
  前記未完成領域情報に相当する領域については、前記3次元設計情報を使用せずに、前記自己位置の推定と前記地図情報の生成とを実行する、
 請求項2から7のいずれか1項に記載の情報処理装置。
The environmental information includes the three-dimensional design information and process information representing the construction process of the building.
The at least one processor
Based on the three-dimensional design information and the process information, unfinished area information representing an area where construction has not been completed in the building is generated.
For the region corresponding to the unfinished region information, the estimation of the self-position and the generation of the map information are executed without using the three-dimensional design information.
The information processing device according to any one of claims 2 to 7.
 前記少なくとも1つのプロセッサは、
  前記環境情報に基づいて、前記地図情報の生成の対象から除外する領域を表すマスク情報を生成し、
  前記マスク情報に相当する領域については、前記地図情報を生成しない、
 請求項1から8のいずれか1項に記載の情報処理装置。
The at least one processor
Based on the environmental information, mask information representing an area to be excluded from the generation target of the map information is generated.
The map information is not generated for the area corresponding to the mask information.
The information processing device according to any one of claims 1 to 8.
 前記環境情報は、前記情報処理装置が位置する建物における人物の入退出情報、または前記情報処理装置に搭載された撮像装置によって撮像された撮像画像における人物の画像認識結果のいずれか1つを含み、
 前記少なくとも1つのプロセッサは、
  前記入退出情報または前記人物の画像認識結果のいずれか1つに基づいて、前記マスク情報を生成する、
 請求項9に記載の情報処理装置。
The environmental information includes either one of the entry / exit information of a person in the building where the information processing device is located or the image recognition result of the person in the captured image captured by the image pickup device mounted on the information processing device. ,
The at least one processor
The mask information is generated based on either the entry / exit information or the image recognition result of the person.
The information processing device according to claim 9.
 前記少なくとも1つのプロセッサは、
 前記情報処理装置に搭載された撮像装置によって撮像された撮像画像を、前記撮像画像に描出された物体の認識結果に基づいて領域分割し、
 前記領域分割の結果に基づいて、前記自己位置の推定と前記地図情報の生成とを実行する、
 請求項1から10のいずれか1項に記載の情報処理装置。
The at least one processor
The captured image captured by the image pickup device mounted on the information processing device is divided into regions based on the recognition result of the object drawn on the captured image.
Based on the result of the region division, the estimation of the self-position and the generation of the map information are executed.
The information processing device according to any one of claims 1 to 10.
 前記環境情報は、周囲の照明または天気のいずれか1つに関する情報を含み、
 前記検知結果は、前記情報処理装置に搭載された撮像装置によって撮像された撮像画像を含み、
 前記少なくとも1つのプロセッサは、
  前記環境情報に基づいて、前記撮像画像が劣化する可能性が高い領域を表す第2のマスク情報を生成し、
  前記第2のマスク情報に相当する領域については、前記撮像画像を少なくとも前記自己位置の推定または前記地図情報の生成のいずれかにおいて使用しない、
 請求項1から10のいずれか1項に記載の情報処理装置。
The environmental information includes information about any one of the ambient lighting or the weather.
The detection result includes an image captured by an image pickup device mounted on the information processing device.
The at least one processor
Based on the environmental information, a second mask information representing a region where the captured image is likely to be deteriorated is generated.
For the region corresponding to the second mask information, the captured image is not used at least in either the estimation of the self-position or the generation of the map information.
The information processing device according to any one of claims 1 to 10.
 前記環境情報は、周囲の照明または天気のいずれか1つに関する情報を含み、
 前記検知結果は、前記情報処理装置に搭載された撮像装置によって撮像された撮像画像を含み、
 前記少なくとも1つのプロセッサは、
  前記環境情報に基づいて、前記撮像画像の階調を変更し、
  諧調が変更された前記撮像画像に基づいて、前記自己位置の推定と前記地図情報の生成とを実行する、
 請求項1から12のいずれか1項に記載の情報処理装置。
The environmental information includes information about any one of the ambient lighting or the weather.
The detection result includes an image captured by an image pickup device mounted on the information processing device.
The at least one processor
Based on the environmental information, the gradation of the captured image is changed.
Based on the captured image whose tone has been changed, the estimation of the self-position and the generation of the map information are executed.
The information processing device according to any one of claims 1 to 12.
 前記環境情報は、前記情報処理装置に搭載された撮像装置によって撮像された撮像画像を含み、
 前記少なくとも1つのプロセッサは、
  前記撮像画像に基づいて、周囲の物体の位置に関する情報を生成し、
  前記周囲の物体の位置に関する情報に基づいて、前記自己位置の推定と前記地図情報の生成とを実行する、
 請求項1から13のいずれか1項に記載の情報処理装置。
The environmental information includes an image captured by an image pickup device mounted on the information processing device.
The at least one processor
Based on the captured image, information about the position of surrounding objects is generated.
Based on the information about the position of the surrounding object, the estimation of the self-position and the generation of the map information are executed.
The information processing device according to any one of claims 1 to 13.
 前記少なくとも1つのプロセッサは、
 周囲の物体の位置を、3次元空間上の複数の点の空間座標として算出し、
  算出した前記複数の点の空間座標を、前記地図情報として出力する、
 請求項1から14のいずれか1項に記載の情報処理装置。
The at least one processor
The position of the surrounding object is calculated as the spatial coordinates of multiple points in the three-dimensional space.
The calculated spatial coordinates of the plurality of points are output as the map information.
The information processing device according to any one of claims 1 to 14.
 少なくとも1つのプロセッサによって、情報処理装置の周囲の状態または前記情報処理装置の状態のいずれか1つを含む検知結果と、前記情報処理装置の周囲の環境に関する環境情報と、を取得するステップと、
 前記少なくとも1つのプロセッサによって、前記環境情報と、前記検知結果とに基づいて、自己位置の推定と地図情報の生成とを実行するステップと、
 を含む情報処理方法。
A step of acquiring a detection result including any one of the surrounding state of the information processing device and the state of the information processing device and environmental information about the environment around the information processing device by at least one processor.
A step of performing self-position estimation and map information generation based on the environmental information and the detection result by the at least one processor.
Information processing methods including.
 少なくとも情報処理装置の周囲の状態または前記情報処理装置の状態のいずれか1つを含む検知結果と、前記情報処理装置の周囲の環境に関する環境情報と、を取得するステップと、
 前記環境情報と、前記検知結果とに基づいて、自己位置の推定と地図情報の生成とを実行するステップと、
 を少なくとも1台のコンピュータに実行させるプログラム。
A step of acquiring a detection result including at least one of the surrounding state of the information processing device and the state of the information processing device, and environmental information regarding the environment around the information processing device.
A step of estimating the self-position and generating map information based on the environmental information and the detection result, and
A program that causes at least one computer to run.
PCT/JP2021/014938 2020-04-15 2021-04-08 Information processing device, information processing method, and program Ceased WO2021210492A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-073040 2020-04-15
JP2020073040 2020-04-15

Publications (1)

Publication Number Publication Date
WO2021210492A1 true WO2021210492A1 (en) 2021-10-21

Family

ID=78085146

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/014938 Ceased WO2021210492A1 (en) 2020-04-15 2021-04-08 Information processing device, information processing method, and program

Country Status (1)

Country Link
WO (1) WO2021210492A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023085183A1 (en) * 2021-11-10 2023-05-19 ソニーグループ株式会社 Information processing device, information processing method, and mobile object
JP2023070207A (en) * 2021-11-09 2023-05-19 三菱電機株式会社 Image recognition device, image recognition system, and image recognition method
JP2023122807A (en) * 2022-02-24 2023-09-05 日立グローバルライフソリューションズ株式会社 Autonomous robot
JP2025084676A (en) * 2023-11-22 2025-06-03 台達電子工業股▲ふん▼有限公司 Computer program product for 3D modeling and method for removing moving objects therefrom
WO2025197855A1 (en) * 2024-03-19 2025-09-25 株式会社アイシン Map creation device, map creation method, and map creation program

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004133567A (en) * 2002-10-09 2004-04-30 Hitachi Ltd Moving object and its position detecting device
JP2008129614A (en) * 2006-11-16 2008-06-05 Toyota Motor Corp Mobile system
JP2018164966A (en) * 2017-03-28 2018-10-25 清水建設株式会社 POSITION ESTIMATION DEVICE, ROBOT, AND POSITION ESTIMATION METHOD

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004133567A (en) * 2002-10-09 2004-04-30 Hitachi Ltd Moving object and its position detecting device
JP2008129614A (en) * 2006-11-16 2008-06-05 Toyota Motor Corp Mobile system
JP2018164966A (en) * 2017-03-28 2018-10-25 清水建設株式会社 POSITION ESTIMATION DEVICE, ROBOT, AND POSITION ESTIMATION METHOD

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023070207A (en) * 2021-11-09 2023-05-19 三菱電機株式会社 Image recognition device, image recognition system, and image recognition method
JP7720771B2 (en) 2021-11-09 2025-08-08 三菱電機株式会社 Image recognition device, image recognition system, and image recognition method
WO2023085183A1 (en) * 2021-11-10 2023-05-19 ソニーグループ株式会社 Information processing device, information processing method, and mobile object
JP2023122807A (en) * 2022-02-24 2023-09-05 日立グローバルライフソリューションズ株式会社 Autonomous robot
JP2025084676A (en) * 2023-11-22 2025-06-03 台達電子工業股▲ふん▼有限公司 Computer program product for 3D modeling and method for removing moving objects therefrom
JP7765572B2 (en) 2023-11-22 2025-11-06 台達電子工業股▲ふん▼有限公司 Computer program product for 3D modeling and method for removing moving objects therefrom
WO2025197855A1 (en) * 2024-03-19 2025-09-25 株式会社アイシン Map creation device, map creation method, and map creation program

Similar Documents

Publication Publication Date Title
WO2021210492A1 (en) Information processing device, information processing method, and program
US8401242B2 (en) Real-time camera tracking using depth maps
WO2019138678A1 (en) Information processing device, control method for same, program, and vehicle driving assistance system
JP7131994B2 (en) Self-position estimation device, self-position estimation method, self-position estimation program, learning device, learning method and learning program
JP5480667B2 (en) Position / orientation measuring apparatus, position / orientation measuring method, program
JP2020030204A (en) Distance measurement method, program, distance measurement system and movable object
CN113870343A (en) Relative pose calibration method, device, computer equipment and storage medium
CN103926933A (en) Indoor simultaneous locating and environment modeling method for unmanned aerial vehicle
JP2019125116A (en) Information processing device, information processing method, and program
CN110260866A (en) A kind of robot localization and barrier-avoiding method of view-based access control model sensor
CN110764110B (en) Path navigation method, device and computer readable storage medium
CN112967340A (en) Simultaneous positioning and map construction method and device, electronic equipment and storage medium
KR20210116161A (en) Heterogeneous sensors calibration method and apparatus using single checkerboard
Hsu et al. Application of multisensor fusion to develop a personal location and 3D mapping system
JP2020149186A (en) Position / orientation estimation device, learning device, mobile robot, position / orientation estimation method, learning method
JP2022138037A (en) Information processing device, information processing method and program
WO2022014322A1 (en) Information processing system and information processing device
US20190369631A1 (en) Intelligent wheelchair system based on big data and artificial intelligence
WO2022198508A1 (en) Lens abnormality prompt method and apparatus, movable platform, and readable storage medium
KR20240123863A (en) Greenhouse robot and method for estimating its position and posture
US20240106998A1 (en) Miscalibration detection for virtual reality and augmented reality systems
US11294379B2 (en) Systems and methods for controlling intelligent wheelchair
CN120510212A (en) Robust visual inertia SLAM method, storage medium and device integrating dotted line flow characteristics
US20240069203A1 (en) Global optimization methods for mobile coordinate scanners
CN117541655A (en) A method to integrate visual semantics to eliminate z-axis cumulative error in radar mapping

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21788847

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21788847

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP