[go: up one dir, main page]

WO2022080554A1 - Apparatus for generating three-dimensional map by using stereo frames of lightweight augmented reality device, and method therefor - Google Patents

Apparatus for generating three-dimensional map by using stereo frames of lightweight augmented reality device, and method therefor Download PDF

Info

Publication number
WO2022080554A1
WO2022080554A1 PCT/KR2020/015393 KR2020015393W WO2022080554A1 WO 2022080554 A1 WO2022080554 A1 WO 2022080554A1 KR 2020015393 W KR2020015393 W KR 2020015393W WO 2022080554 A1 WO2022080554 A1 WO 2022080554A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
sub
map
pose
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/KR2020/015393
Other languages
French (fr)
Korean (ko)
Inventor
장준환
박우출
양진욱
윤상필
최민수
이준석
송수호
구본재
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Korea Electronics Technology Institute
Original Assignee
Korea Electronics Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Korea Electronics Technology Institute filed Critical Korea Electronics Technology Institute
Publication of WO2022080554A1 publication Critical patent/WO2022080554A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • G06T19/006Mixed reality
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present invention relates to a technology for constructing a three-dimensional map, and more particularly, to an apparatus for generating a three-dimensional map using a stereo frame of a lightweight augmented reality device, and a method therefor.
  • Virtual reality refers to a specific environment or situation or the technology itself that is similar to reality created by artificial technology using a computer, etc. but is not real.
  • Augmented reality is a field of virtual reality (VR) and is a computer graphic technique that synthesizes virtual objects or information in an actual environment to make them appear as if they exist in the original environment.
  • augmented reality is a technology that superimposes virtual objects on the real world that users see with their eyes.
  • MR mixed reality
  • Augmented reality a concept that complements the real world with a virtual world, uses a virtual environment created with computer graphics, but the main character is the real environment. Computer graphics serve to provide additional information necessary for the real environment. This means that the distinction between the real environment and the virtual screen is blurred by overlapping the 3D virtual image on the actual image the user is viewing.
  • Virtual reality technology immerses the user in the virtual environment, making it impossible to see the real environment.
  • augmented reality technology in which the real environment and virtual objects are mixed, allows users to see the real environment, providing better realism and additional information.
  • the width is less than 20 cm because it will have a size similar to that of the largest glasses.
  • the stereo camera in the case of a lightweight augmented reality device, the stereo camera (camera unit) must be installed in a narrow range of 20 cm or less, and in this state, there is no choice but to have a narrow parallax. With such a narrow disparity, distance detection in a distant space becomes inaccurate, which limits the augmented reality service type.
  • the focus is on real-time detection rather than precision of location detection due to the characteristics of an augmented reality device that can provide a service without a sense of heterogeneity only by detecting the location of the augmented reality device at every moment.
  • it is difficult to obtain desired performance from the computing power of the lightweight augmented reality device.
  • a lightweight augmented reality device its performance is more limited due to a power (battery) problem.
  • An object of the present invention in consideration of the above is to provide an apparatus for generating a three-dimensional map using a stereo frame of a lightweight augmented reality device and a method therefor.
  • a service server for generating a three-dimensional map using a stereo frame is a plurality of frames including a main frame and a sub-frame of an image photographed from an augmented reality device.
  • a communication module continuously receiving and a 3D map generator for generating a 3D map based on the pose information and the depth map, and transmitting the generated 3D map to the augmented reality device through the communication module.
  • the pose derivation unit extracts a feature point representing the same object from the main frame and the sub frame, and sequentially generates the pose information and the pose matrix through a change between the coordinates of the feature point in the main frame and the coordinates of the feature point in the sub frame. It is characterized in that it is derived by .
  • the depth derivation unit derives a transformation matrix using the known camera matrix and the pose matrix of the augmented reality device, and uses the transformation matrix to generate a simulating sub-frame simulating the sub-frame from the main frame, and the depth It is characterized in that the depth map is derived according to the difference in coordinates between the pixels of the sub-frame and the pixels of the simulated sub-frame through the learning model.
  • the depth derivation unit derives a transformation matrix using a pose matrix and a known camera matrix derived from the main frame for learning and the sub-frame for learning, and uses the transformation matrix to simulate the sub-frame for learning from the main frame for learning.
  • Create a frame learn the correlation between the coordinate difference and the depth of the pixel of the sub-frame for learning of the part representing the same object in the real world with respect to the prototype of the model and the pixel of the sub-frame for imitation learning, thereby learning the pixel of the sub-frame for training
  • it is characterized in that it generates a deep learning model for deriving a depth map according to the coordinate difference of the pixel of the sub-frame for simulation learning.
  • the method for generating a three-dimensional map using a stereo frame of a service server for achieving the object as described above is the main frame of the image captured by the pose derivation unit from the augmented reality device through the communication module. and deriving pose information from the main frame and the sub-frame when receiving a plurality of stereo frames including a sub-frame, and a depth map calculating unit using a deep learning model to obtain a depth map from the main frame and the sub-frame deriving a 3D map generating unit generating a 3D map based on the pose information and the depth map; It includes a 3D map generator for transmitting to the device.
  • the step of deriving the pose information from the main frame and the sub-frame includes: the pose extracting unit extracting a feature point representing the same object from each of the main frame and the sub-frame; and sequentially deriving pose information and a pose matrix through a change between the coordinates and the coordinates of the feature point in the sub-frame.
  • the step of generating a three-dimensional map based on the pose information and the depth map may include deriving a transformation matrix by the depth derivation unit using a known camera matrix and the pose matrix of the augmented reality device, and the depth derivation unit using the pose matrix. generating an imitation sub-frame simulating a sub-frame from the main frame using a transformation matrix, and the depth derivation unit through the deep learning model deriving a map.
  • the method includes, before deriving the pose information from the main frame and the sub-frames, the depth derivation unit deriving a transformation matrix using a pose matrix derived from the main frame for learning and the sub-frame for learning and a known camera matrix;
  • the depth derivation unit generates a sub-frame for imitation learning that simulates the sub-frame for learning from the main frame for learning using the transformation matrix, and the depth derivation unit represents the same object in the real world with respect to the prototype of the model.
  • Deep learning the correlation between the depth and the coordinate difference between the pixels of the sub-frame for learning and the pixels of the sub-frame for simulation learning to derive a depth map according to the coordinate difference between the pixels of the sub-frame for learning and the pixels of the sub-frame for simulating learning It further comprises the step of generating a learning model.
  • the service server receives the stereo frame of the image captured by the augmented reality device, and derives a three-dimensional map therefrom.
  • the augmented reality device can match the virtual object to the image taken by using the 3D map derived from the service server. Since the 3D map of the present invention provides precise 3D coordinates, precise registration is possible when registering virtual objects. Accordingly, it is possible to provide augmented reality with higher realism. Moreover, since the 3D map of the present invention is generated by the service server rather than directly derived by the augmented reality device, it is possible to provide a high-quality service even when the augmented reality device is lightweight.
  • FIG. 1 is a diagram for explaining the configuration of a system for providing an augmented reality image according to an embodiment of the present invention.
  • FIG. 2 is a view for explaining the configuration of a lightweight augmented reality device according to an embodiment of the present invention.
  • FIG. 3 is a diagram for explaining the configuration of a service server for providing a virtual reality image according to an embodiment of the present invention.
  • FIG. 4 is a view for explaining a detailed configuration of a control unit according to an embodiment of the present invention.
  • 5 and 6 are diagrams for explaining a method of deriving a pose matrix according to an embodiment of the present invention.
  • FIG. 7 is a diagram for explaining a method of deriving a transformation matrix using a pose matrix and a camera matrix according to an embodiment of the present invention.
  • FIG. 8 is a diagram for explaining a method of generating a simulated frame using a transform matrix according to an embodiment of the present invention.
  • FIG. 9 is a diagram for explaining a method for learning a correlation between pixels of two frames according to an embodiment of the present invention.
  • FIG. 10 is a flowchart illustrating a method for generating a 3D map using a stereo frame of a lightweight augmented reality device according to an embodiment of the present invention.
  • FIG. 1 is a diagram for explaining the configuration of a system for providing an augmented reality image according to an embodiment of the present invention.
  • a system for providing a virtual reality image includes an augmented reality device 10 and a service server 20 .
  • the augmented reality device 10 and the service server 20 are connected through wireless communication.
  • the augmented reality device 10 may be any device capable of providing an augmented reality image to a user. Representatively, the augmented reality device 10 may be exemplified by smart glasses.
  • the service server 20 may be an edge cloud server located closest to the augmented reality device 10 or a high-performance PC connected to the augmented reality device 10 through Wi-Fi, etc.
  • the service server 20 receives the stereo frame of the image captured by the augmented reality device 10 , generates a 3D map from the stereo frame, and transmits the generated 3D map to the augmented reality device 10 . to provide.
  • the augmented reality device 10 does not require high performance by generating a three-dimensional map using the resources of the service server 20, that is, computing operation. Therefore, the augmented reality device 10 can be lightweight.
  • FIG. 2 is a view for explaining the configuration of a lightweight augmented reality device according to an embodiment of the present invention.
  • the augmented reality device 10 includes a camera unit 11 , a communication unit 12 , a sensor unit 13 , an audio unit 14 , an input unit 15 , and a display unit ( 16), a storage unit 17 and a control unit 18 .
  • the camera unit 11 is for capturing an image.
  • the camera unit 11 may be a stereo camera.
  • the image captured by the camera unit 11 includes a main frame and a sub frame.
  • the camera unit 12 may include two lenses and two image sensors corresponding to each of the main frame and the sub-frame.
  • Each image sensor receives light reflected from a subject and converts it into an electrical signal, and may be implemented based on a Charged Coupled Device (CCD), a Complementary Metal-Oxide Semiconductor (CMOS), or the like.
  • CCD Charged Coupled Device
  • CMOS Complementary Metal-Oxide Semiconductor
  • the camera unit 11 may further include one or more analog-to-digital converters, and may convert an electrical signal output from the image sensor into a digital sequence and output it to the control unit 18 .
  • the communication unit 12 is for communication with the service server 20 .
  • the communication unit 12 may include a radio frequency (RF) transmitter (Tx) that up-converts and amplifies the frequency of the transmitted signal, and an RF receiver (Rx) that low-noise amplifies the received signal and down-converts the frequency.
  • RF radio frequency
  • Tx radio frequency
  • Rx RF receiver
  • the communication unit 12 may include a modem that modulates a transmitted signal and demodulates a received signal.
  • the sensor unit 13 is for measuring inertia.
  • the sensor unit 13 includes an Inertial Measurement Unit (IMU), a Doppler Velocity Log (DVL), an Attitude and Heading Reference System (AHRS), and the like.
  • the sensor unit 13 measures inertial information including the position and speed of rotation and movement of the augmented reality device 10 and provides the measured inertial information of the augmented reality device 10 to the control unit 18 .
  • IMU Inertial Measurement Unit
  • DVD Doppler Velocity Log
  • AHRS Attitude and Heading Reference System
  • the audio unit 14 includes a speaker SPK for outputting an audio signal and a microphone MIKE for receiving an audio signal.
  • the audio unit 14 may output an audio signal through the speaker SPK or transmit an audio signal input through the microphone MIKE to the control unit 18 under the control of the control unit 18 .
  • the audio unit 14 serves to output an audio signal of the virtual reality image.
  • the input unit 15 receives a user's key manipulation for controlling the augmented reality device 10 , generates an input signal, and transmits it to the control unit 18 .
  • the input unit 15 may include various types of keys for controlling the augmented reality device 10 .
  • the functions of various keys can be performed on the display unit 16, and when all functions can be performed only with the touch screen, the input unit 15 may be omitted. may be
  • the display unit 16 visually provides a menu of the augmented reality device 10, input data, function setting information, and various other information to the user.
  • the display unit 16 performs a function of outputting a boot screen, a standby screen, a menu screen, and the like of the augmented reality device 10 .
  • the display unit 16 performs a function of outputting a 3D map according to an embodiment of the present invention to the screen.
  • the display unit 16 may be formed of a liquid crystal display (LCD), an organic light emitting diode (OLED), an active matrix organic light emitting diode (AMOLED), or the like.
  • the display unit 16 may be implemented as a touch screen. In this case, the display unit 16 includes a touch sensor.
  • the touch sensor detects a user's touch input.
  • the touch sensor may be composed of a touch sensing sensor such as a capacitive overlay, a pressure type, a resistive overlay, or an infrared beam, or may be composed of a pressure sensor. .
  • a touch sensing sensor such as a capacitive overlay, a pressure type, a resistive overlay, or an infrared beam
  • all types of sensor devices capable of sensing contact or pressure of an object may be used as the touch sensor of the present invention.
  • the touch sensor may detect a user's touch input, generate a detection signal including input coordinates indicating the touched position, and transmit it to the controller 18 .
  • some or all of the functions of the input unit 15 may be performed through the display unit 16 .
  • the storage unit 17 serves to store programs and data necessary for the operation of the augmented reality device 10 .
  • the storage unit 17 may store a camera matrix, a pose matrix, and the like.
  • the storage unit 17 may store a virtual object to be matched with a real image in order to provide an augmented reality image.
  • Various types of data stored in the storage unit 17 may be deleted, changed, or added according to a user's operation of the augmented reality device 10 .
  • the controller 18 may control the overall operation of the augmented reality device 10 and the signal flow between internal blocks of the augmented reality device 10 , and perform a data processing function of processing data.
  • the control unit 18 basically performs a role of controlling various functions of the augmented reality device (10).
  • the controller 18 may include a central processing unit (CPU), a baseband processor (BP), an application processor (AP), a graphic processing unit (GPU), a digital signal processor (DSP), and the like.
  • the controller 18 takes an image through the camera unit 11 and transmits the captured image to the service server 20 through the communication unit 11 in units of stereo frames including a main frame and a sub frame. Then, the service server 20 will generate a 3D map from a plurality of stereo frames including a main frame and a sub frame, and transmit the generated 3D map to the augmented reality device 10 . Accordingly, the controller 18 may provide the augmented reality image by receiving the 3D map through the communication unit 11 and matching the virtual object to the image captured using the received 3D map.
  • the service server 20 for generating a three-dimensional map using the stereo frame of the lightweight augmented reality device 10 according to an embodiment of the present invention will be described.
  • 3 is a diagram for explaining the configuration of a service server for providing a virtual reality image according to an embodiment of the present invention.
  • the service server 20 according to an embodiment of the present invention includes a communication module 21 , a storage module 22 , and a control module 23 .
  • the communication module 21 is for communicating with the augmented reality device 10 through a network.
  • the communication module 21 may transmit/receive data to and from the augmented reality device 10 .
  • the communication module 21 may include an RF (Radio Frequency) transmitter (Tx) for up-converting and amplifying the frequency of a transmitted signal, and an RF receiver (Rx) for low-noise amplifying a received signal and down-converting the frequency.
  • the communication module 21 may include a modem for modulating a signal to be transmitted and demodulating a signal to be received in order to transmit/receive data.
  • the communication module 21 may transmit data received from the control module 23 , for example, a 3D map to the augmented reality device 10 .
  • the communication module 21 may transmit data received from the augmented reality device 10 , for example, a stereo frame to the control module 23 .
  • the storage module 22 serves to store programs and data necessary for the operation of the service server 20 .
  • the storage module 22 may store various parameters for generating a 3D map corresponding to the augmented reality device 10 , for example, a camera matrix of the augmented reality device 10 .
  • Various types of data stored in the storage module 121 may be registered, deleted, changed, or added according to the operation of the service server 20 administrator.
  • the control module 23 may control the overall operation of the service server 20 and the signal flow between internal blocks of the service server 20 , and may perform a data processing function of processing data.
  • the control module 130 may be a central processing unit, a digital signal processor, or the like.
  • the control module 23 may further include an image processor or a graphic processing unit (GPU).
  • 4 is a view for explaining a detailed configuration of a control unit according to an embodiment of the present invention.
  • 5 and 6 are diagrams for explaining a method of deriving a pose matrix according to an embodiment of the present invention.
  • 7 is a diagram for explaining a method of deriving a transformation matrix using a pose matrix and a camera matrix according to an embodiment of the present invention.
  • 8 is a diagram for explaining a method of generating a simulated frame using a transform matrix according to an embodiment of the present invention.
  • 9 is a diagram for explaining a method for learning a correlation between pixels of two frames according to an embodiment of the present invention.
  • control module 23 includes a pose extracting unit 210 , a depth extracting unit 220 , and a 3D map generating unit 230 .
  • the pose derivation unit 210 is for obtaining a pose of the augmented reality device 10 .
  • the pose derivation unit 210 extracts a feature point representing the same object in each of the main frame and the sub frame of the stereo frame of the image captured through the camera unit 11 of the augmented reality device 10, and extracts the extracted Pose information is calculated by calculating a change in coordinates of a feature point.
  • the first pose Pose1 corresponding to the main frame MF and the second pose Pose2 corresponding to the sub frame SF are as shown in FIG. 5 .
  • a screen example of the main frame MF and the sub-frame SF is shown in FIG. 6 .
  • a feature point P indicating the same object is extracted from the main frame MF and the sub-frame SF. It can be seen that there is a difference between the coordinates of the feature point P(t-1) in the main frame MF and the coordinates of the feature point P(t) in the sub frame SF.
  • the pose derivation unit 210 extracts a feature point P representing the same object, and the coordinates of the feature point P(m) in the main frame MF of the extracted feature point P and the feature point P(s) in the sub frame SF.
  • Pose information can be derived by changing the coordinates of .
  • the pose derivation unit 210 derives pose information (position, rotation information) by calculating the change in feature points, and expresses the pose information as a matrix to derive a pose matrix.
  • the depth derivation unit 220 is for obtaining a depth map using a deep learning model (DLM).
  • the depth derivation unit 220 generates a deep learning model (DLM) by using learning data including a main frame for learning (MF) and a sub-frame for learning (SF), and such a method is as follows.
  • the depth derivation unit 220 first derives a transition matrix (TM) using a pose matrix (PM) and a camera matrix (CM).
  • the pose matrix PM is a matrix expression of pose information derived from the main frame MF for learning and the subframe SF for learning by the pose derivation unit 210 .
  • the camera matrix CM is an internal parameter of the camera unit 11 of the augmented reality device 10 .
  • This camera matrix (CM) is received in advance from the augmented reality device 10 is stored in the storage module (220).
  • the depth derivation unit 220 transforms the main frame MF for learning using a transformation matrix TM to simulate the subframe SF for learning (SF') create
  • the depth derivation unit 220 includes a pixel of a sub-frame for learning (SF) and a pixel of a sub-frame (SF') for imitation of a part representing the same object in the real world with respect to the prototype of the model.
  • a deep learning model that learns the correlation between the coordinate difference and the depth (deep learning) and derives a depth map according to the coordinate difference between the pixels of the sub-frame for learning (SF) and the pixels of the sub-frame (SF') for simulation learning (DLM) is created.
  • the depth derivation unit 220 receives the main frame MF and the sub frame SF from the augmented reality device 10 through the communication module 21 .
  • a transformation matrix TM is derived using the pose matrix PM and the camera matrix CM.
  • the depth derivation unit 220 converts the main frame MF using the transformation matrix TM to obtain a replica sub-frame SF' that mimics the sub-frame SF.
  • the depth derivation unit 220 inputs the sub-frame SF and the simulated sub-frame SF' to the deep learning model DLM.
  • the deep learning model (DLM) derives a depth map according to the difference in coordinates between the pixels of the sub-frame SF and the pixels of the simulated sub-frame SF'.
  • the 3D map generator 230 derives a 3D map from the frame of the photographed image by using the pose information obtained by the pose derivation unit 210 and the depth map derived by the depth derivation unit 220 . That is, the position and rotation information between the main frame MF and the sub-frame SF of the augmented reality device 10 can be known from the pose information, and the depth between the main frame MF and the sub-frame F1 through the depth map. can be known, the 3D map generator 230 may generate a 3D map by converting the 2D coordinates of the pixels of the corresponding frame into 3D coordinates using the position and rotation information and the depth.
  • FIG. 10 is a flowchart illustrating a method for generating a 3D map using a stereo frame of a lightweight augmented reality device according to an embodiment of the present invention.
  • a deep learning model (DLM) that derives a depth map according to a difference in pixel coordinates of a plurality of frames by deep learning the model prototype is generated. .
  • DLM deep learning model
  • the communication module 21 receives a plurality of frames including a main frame and a sub frame of a stereo image from the augmented reality device 10 in step S110 .
  • the pose derivation unit 210 derives the pose information and the pose matrix based on the main frame MF and the sub frame SF in step S120 . That is, with reference to FIGS. 5 and 6 , the pose derivation unit 210 extracts a feature point P indicating the same object in each of the main frame MF and the sub frame SF, and the main frame MF of the extracted feature point P. Pose information can be derived through the change of the coordinates of the feature point P(m) in ) and the coordinates of the feature point P(s) in the sub-frame SF. In this way, the pose derivation unit 210 derives pose information (position, rotation information) by calculating the change in feature points, and expresses the pose information as a matrix to derive a pose matrix.
  • pose information position, rotation information
  • the depth derivation unit 220 derives a depth map from the main frame MF and the sub-frame SF using the pose matrix and the deep learning model (DLM) in step S130 .
  • the step S130 will be described in more detail as follows.
  • the depth derivation unit 220 is first, as shown in FIG. 7 , the pose matrix PM and the augmented reality device 10 derived from the main frame MF and the sub frame SF by the pose derivation unit 210 . ), a transformation matrix TM is derived using the known camera matrix CM for the camera part 11 . Then, as shown in FIG. 8 , the depth derivation unit 220 converts the main frame MF using the transformation matrix TM to obtain a replica sub-frame SF' that mimics the sub-frame SF. create Next, the depth derivation unit 220 inputs the sub-frame SF and the simulated sub-frame SF' to the deep learning model DLM. Then, the deep learning model (DLM) may derive a depth map according to the difference in coordinates between the pixels of the sub-frame SF and the pixels of the simulated sub-frame SF'.
  • DLM deep learning model
  • the 3D map generator 230 uses the pose information derived by the pose derivation unit 210 and the depth map derived from the depth derivation unit 220 in step S140 to provide a three-dimensional map of the frame of the photographed image. create That is, the position and rotation information between the main frame MF and the sub-frame SF of the augmented reality device 10 can be known from the pose information, and the depth between the main frame MF and the sub-frame F1 through the depth map. can be known, the 3D map generator 230 may convert the 2D coordinates of the pixels of the corresponding frame into 3D coordinates using the position and rotation information and the depth.
  • the 3D map generation unit 230 transmits the generated 3D map to the augmented reality device 10 through the communication module 21 in step S150 .
  • the control unit 18 of the augmented reality device 10 may receive the 3D map through the communication unit 11 .
  • the controller 180 calls the virtual object stored in the storage unit 17 in step S160 and matches the virtual object to the image captured by the camera unit 11, but matches the virtual object according to the received 3D map. to create an augmented reality image.
  • the control unit 18 may output the augmented reality image through the display unit 16 in step S170.
  • a virtual object can be registered with an image captured by using the 3D map derived as described above. Since the 3D map of the present invention provides precise 3D coordinates, precise registration is possible when registering virtual objects. Accordingly, it is possible to provide augmented reality with higher realism. Moreover, since the three-dimensional map of the present invention is generated by the service server 20, not directly derived by the augmented reality device 10, the augmented reality device 10 can be reduced in weight.
  • the method according to the embodiment of the present invention described above may be implemented in the form of a program readable by various computer means and recorded in a computer readable recording medium.
  • the recording medium may include a program command, a data file, a data structure, etc. alone or in combination.
  • the program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software.
  • the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks ( magneto-optical media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.
  • Examples of program instructions may include high-level languages that can be executed by a computer using an interpreter or the like as well as machine language such as generated by a compiler.
  • Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Graphics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A service server for generating a three-dimensional map by using stereo frames, in the present invention, comprises: a communication module for continuously receiving a plurality of frames including a main frame and subframes of a video captured by an augmented reality device; a pose derivation unit for obtaining pose information from the main frame and the subframes; a depth map calculation unit, which uses a deep learning model to derive a depth map from the main frame and the subframes; and a three-dimensional map generation unit for generating a three-dimensional map on the basis of the pose information and the depth map, and transmitting the generated three-dimensional map to the augmented reality device through the communication module.

Description

경량형 증강현실장치의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 장치 및 이를 위한 방법Apparatus and method for generating 3D map using stereo frame of lightweight augmented reality device

본 발명은 3차원 맵 구성 기술에 관한 것으로, 보다 상세하게는, 경량형 증강현실장치의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 장치 및 이를 위한 방법에 관한 것이다. The present invention relates to a technology for constructing a three-dimensional map, and more particularly, to an apparatus for generating a three-dimensional map using a stereo frame of a lightweight augmented reality device, and a method therefor.

가상현실(virtual reality, VR)은 컴퓨터 등을 사용한 인공적인 기술로 만들어낸 실제와 유사하지만 실제가 아닌 어떤 특정한 환경이나 상황 혹은 그 기술 자체를 의미한다. 증강현실(augmented reality, AR)은 가상현실(VR)의 한 분야로 실제로 존재하는 환경에 가상의 사물이나 정보를 합성하여 마치 원래의 환경에 존재하는 사물처럼 보이도록 하는 컴퓨터 그래픽 기법이다. 즉, 증강현실은 사용자가 눈으로 보는 현실세계에 가상 물체를 겹쳐 보여주는 기술이다. 현실세계에 실시간으로 부가정보를 갖는 가상세계를 합쳐 하나의 영상으로 보여주므로 혼합현실(mixed reality, MR)이라고도 한다. 현실세계를 가상세계로 보완해주는 개념인 증강현실은 컴퓨터 그래픽으로 만들어진 가상환경을 사용하지만 주역은 현실 환경이다. 컴퓨터 그래픽은 현실 환경에 필요한 정보를 추가 제공하는 역할을 한다. 사용자가 보고 있는 실사 영상에 3차원 가상영상을 겹침(overlap)으로써 현실 환경과 가상화면과의 구분이 모호해지도록 한다는 뜻이다. Virtual reality (VR) refers to a specific environment or situation or the technology itself that is similar to reality created by artificial technology using a computer, etc. but is not real. Augmented reality (AR) is a field of virtual reality (VR) and is a computer graphic technique that synthesizes virtual objects or information in an actual environment to make them appear as if they exist in the original environment. In other words, augmented reality is a technology that superimposes virtual objects on the real world that users see with their eyes. It is also called mixed reality (MR) because the real world and the virtual world with additional information are combined in real time and displayed as a single image. Augmented reality, a concept that complements the real world with a virtual world, uses a virtual environment created with computer graphics, but the main character is the real environment. Computer graphics serve to provide additional information necessary for the real environment. This means that the distinction between the real environment and the virtual screen is blurred by overlapping the 3D virtual image on the actual image the user is viewing.

가상현실 기술은 가상환경에 사용자를 몰입하게 하여 실제 환경을 볼 수 없다. 하지만 실제 환경과 가상의 객체가 혼합된 증강현실기술은 사용자가 실제 환경을 볼 수 있게 하여 보다 나은 현실감과 부가 정보를 제공한다. Virtual reality technology immerses the user in the virtual environment, making it impossible to see the real environment. However, augmented reality technology, in which the real environment and virtual objects are mixed, allows users to see the real environment, providing better realism and additional information.

경량형 증강현실장치의 경우 최대 안경과 비슷한 크기를 가질 것이기 때문에 폭이 20cm이하이다. 증강현실 서비스를 위해서는 증강현실장치의 위치 및 시선을 추적해야 한다. 이를 위하여, 스테레오 카메라를 사용한다면 경량화된 증강현실장치의 경우, 20cm이하의 좁은 범위에 스테레오 카메라(카메라부)를 설치해야 하고 이 상태에서는 좁은 시차를 가질 수밖에 없다. 이런 좁은 시차로는 먼 공간에 대한 거리 검출이 부정확하게 되고 이는 증강현실 서비스 형태에 제한을 가져온다. 매순간 증강현실장치의 위치를 검출해야만 이질감 없는 서비스를 제공할 수 있는 증강현실 디바이스의 특성상 위치 검출의 정밀도보다는 실시간 검출에 초점을 두는 경우가 대부분이다. 또한 위치 검출 정밀도를 향상시키기 위한 기법들의 경우, 경량화된 증강현실장치의 컴퓨팅 연산 능력으로부터 원하는 성능을 얻기가 어렵다. 더욱이, 경량형 증강현실장치의 경우 전력(배터리) 문제로 인해 그 성능은 더욱 제한적이다. In the case of a lightweight augmented reality device, the width is less than 20 cm because it will have a size similar to that of the largest glasses. For augmented reality service, it is necessary to track the position and gaze of the augmented reality device. To this end, if a stereo camera is used, in the case of a lightweight augmented reality device, the stereo camera (camera unit) must be installed in a narrow range of 20 cm or less, and in this state, there is no choice but to have a narrow parallax. With such a narrow disparity, distance detection in a distant space becomes inaccurate, which limits the augmented reality service type. In most cases, the focus is on real-time detection rather than precision of location detection due to the characteristics of an augmented reality device that can provide a service without a sense of heterogeneity only by detecting the location of the augmented reality device at every moment. In addition, in the case of techniques for improving the precision of position detection, it is difficult to obtain desired performance from the computing power of the lightweight augmented reality device. Moreover, in the case of a lightweight augmented reality device, its performance is more limited due to a power (battery) problem.

상술한 점을 감안한 본 발명의 목적은 경량형 증강현실장치의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 장치 및 이를 위한 방법을 제공함에 있다. An object of the present invention in consideration of the above is to provide an apparatus for generating a three-dimensional map using a stereo frame of a lightweight augmented reality device and a method therefor.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 서비스서버는 증강현실장치로부터 촬영된 영상의 주 프레임 및 부 프레임을 포함하는 복수의 프레임을 지속적으로 수신하는 통신모듈과, 상기 주 프레임 및 상기 부 프레임으로부터 포즈 정보를 획득하는 포즈도출부와, 심층학습모델을 이용하여 상기 주 프레임 및 상기 부 프레임으로부터 뎁스 맵을 도출하는 뎁스맵산출부와, 상기 포즈 정보 및 상기 뎁스 맵을 기초로 3차원 맵을 생성하고, 생성된 3차원 맵을 상기 통신모듈을 통해 상기 증강현실장치로 전송하는 3차원맵생성부를 포함한다. A service server for generating a three-dimensional map using a stereo frame according to a preferred embodiment of the present invention for achieving the above object is a plurality of frames including a main frame and a sub-frame of an image photographed from an augmented reality device. a communication module continuously receiving and a 3D map generator for generating a 3D map based on the pose information and the depth map, and transmitting the generated 3D map to the augmented reality device through the communication module.

상기 포즈도출부는 상기 주 프레임 및 상기 부 프레임에서 동일한 대상을 나타내는 특징점을 추출하고, 상기 주 프레임에서의 특징점의 좌표 및 상기 부 프레임에서의 특징점의 좌표 사이의 변화를 통해 포즈 정보 및 포즈 매트릭스를 순차로 도출하는 것을 특징으로 한다. The pose derivation unit extracts a feature point representing the same object from the main frame and the sub frame, and sequentially generates the pose information and the pose matrix through a change between the coordinates of the feature point in the main frame and the coordinates of the feature point in the sub frame. It is characterized in that it is derived by .

상기 뎁스도출부는 상기 증강현실장치의 알려진 카메라 매트릭스 및 상기 포즈 매트릭스를 이용하여 변환 매트릭스를 도출하고, 상기 변환 매트릭스를 이용하여 상기 주 프레임으로부터 상기 부 프레임을 모사하는 모사 부 프레임을 생성하고, 상기 심층학습모델을 통해 상기 부 프레임의 픽셀과 모사 부 프레임의 픽셀의 좌표 차이에 따라 뎁스 맵을 도출하는 것을 특징으로 한다. The depth derivation unit derives a transformation matrix using the known camera matrix and the pose matrix of the augmented reality device, and uses the transformation matrix to generate a simulating sub-frame simulating the sub-frame from the main frame, and the depth It is characterized in that the depth map is derived according to the difference in coordinates between the pixels of the sub-frame and the pixels of the simulated sub-frame through the learning model.

상기 뎁스도출부는 학습용 주 프레임 및 학습용 부 프레임으로부터 도출된 포즈 매트릭스 및 알려진 카메라 매트릭스를 이용하여 변환 매트릭스를 도출하고, 상기 변환 매트릭스를 이용하여 상기 학습용 주 프레임으로부터 상기 학습용 부 프레임을 모사하는 모사 학습용 부 프레임을 생성하고, 모델의 원형에 대해 현실 세계에서 동일한 대상을 나타내는 부분의 상기 학습용 부 프레임의 픽셀과 상기 모사 학습용 부 프레임의 픽셀의 좌표 차이와 뎁스 간의 상관관계를 학습시켜 상기 학습용 부 프레임의 픽셀과 상기 모사 학습용 부 프레임의 픽셀의 좌표 차이에 따라 뎁스 맵을 도출하는 심층학습모델을 생성하는 것을 특징으로 한다. The depth derivation unit derives a transformation matrix using a pose matrix and a known camera matrix derived from the main frame for learning and the sub-frame for learning, and uses the transformation matrix to simulate the sub-frame for learning from the main frame for learning. Create a frame, learn the correlation between the coordinate difference and the depth of the pixel of the sub-frame for learning of the part representing the same object in the real world with respect to the prototype of the model and the pixel of the sub-frame for imitation learning, thereby learning the pixel of the sub-frame for training And it is characterized in that it generates a deep learning model for deriving a depth map according to the coordinate difference of the pixel of the sub-frame for simulation learning.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 서비스서버의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 방법은 포즈도출부가 통신모듈을 통해 증강현실장치로부터 촬영된 영상의 주 프레임 및 부 프레임을 포함하는 복수의 스테레오 프레임을 수신하면, 상기 주 프레임 및 상기 부 프레임으로부터 포즈 정보를 도출하는 단계와, 뎁스맵산출부가 심층학습모델을 이용하여 상기 주 프레임 및 상기 부 프레임으로부터 뎁스 맵을 도출하는 단계와, 3차원맵생성부가 상기 포즈 정보 및 상기 뎁스 맵을 기초로 3차원맵을 생성하는 단계와, 3차원맵생성부가 상기 생성된 3차원 맵을 상기 통신모듈을 통해 상기 증강현실장치로 전송하는 3차원맵생성부를 포함한다. The method for generating a three-dimensional map using a stereo frame of a service server according to a preferred embodiment of the present invention for achieving the object as described above is the main frame of the image captured by the pose derivation unit from the augmented reality device through the communication module. and deriving pose information from the main frame and the sub-frame when receiving a plurality of stereo frames including a sub-frame, and a depth map calculating unit using a deep learning model to obtain a depth map from the main frame and the sub-frame deriving a 3D map generating unit generating a 3D map based on the pose information and the depth map; It includes a 3D map generator for transmitting to the device.

상기 주 프레임 및 상기 부 프레임으로부터 포즈 정보를 도출하는 단계는 상기 포즈도출부가 상기 주 프레임 및 상기 부 프레임 각각에서 동일한 대상을 나타내는 특징점을 추출하는 단계와, 상기 포즈도출부가 상기 주 프레임에서의 특징점의 좌표 및 상기 부 프레임에서의 특징점의 좌표 사이의 변화를 통해 포즈 정보 및 포즈 매트릭스를 순차로 도출하는 단계를 포함한다. The step of deriving the pose information from the main frame and the sub-frame includes: the pose extracting unit extracting a feature point representing the same object from each of the main frame and the sub-frame; and sequentially deriving pose information and a pose matrix through a change between the coordinates and the coordinates of the feature point in the sub-frame.

상기 포즈 정보 및 상기 뎁스 맵을 기초로 3차원맵을 생성하는 단계는 상기 뎁스도출부가 상기 증강현실장치의 알려진 카메라 매트릭스 및 상기 포즈 매트릭스를 이용하여 변환 매트릭스를 도출하는 단계와, 상기 뎁스도출부가 상기 변환 매트릭스를 이용하여 상기 주 프레임으로부터 부 프레임을 모사하는 모사 부 프레임을 생성하는 단계와, 상기 뎁스도출부가 상기 심층학습모델을 통해 상기 부 프레임의 픽셀과 모사 부 프레임의 픽셀의 좌표 차이에 따라 뎁스 맵을 도출하는 단계를 포함한다. The step of generating a three-dimensional map based on the pose information and the depth map may include deriving a transformation matrix by the depth derivation unit using a known camera matrix and the pose matrix of the augmented reality device, and the depth derivation unit using the pose matrix. generating an imitation sub-frame simulating a sub-frame from the main frame using a transformation matrix, and the depth derivation unit through the deep learning model deriving a map.

상기 방법은 상기 주 프레임 및 상기 부 프레임으로부터 포즈 정보를 도출하는 단계 전, 상기 뎁스도출부가 학습용 주 프레임 및 학습용 부 프레임으로부터 도출된 포즈 매트릭스 및 알려진 카메라 매트릭스를 이용하여 변환 매트릭스를 도출하는 단계와, 상기 뎁스도출부가 상기 변환 매트릭스를 이용하여 상기 학습용 주 프레임으로부터 상기 학습용 부 프레임을 모사하는 모사 학습용 부 프레임을 생성하는 단계와, 상기 뎁스도출부가 모델의 원형에 대해 현실 세계에서 동일한 대상을 나타내는 부분의 상기 학습용 부 프레임의 픽셀과 상기 모사 학습용 부 프레임의 픽셀의 좌표 차이와 뎁스 간의 상관관계를 학습시켜 상기 학습용 부 프레임의 픽셀과 상기 모사 학습용 부 프레임의 픽셀의 좌표 차이에 따라 뎁스 맵을 도출하는 심층학습모델을 생성하는 단계를 더 포함한다. The method includes, before deriving the pose information from the main frame and the sub-frames, the depth derivation unit deriving a transformation matrix using a pose matrix derived from the main frame for learning and the sub-frame for learning and a known camera matrix; The depth derivation unit generates a sub-frame for imitation learning that simulates the sub-frame for learning from the main frame for learning using the transformation matrix, and the depth derivation unit represents the same object in the real world with respect to the prototype of the model. Deep learning the correlation between the depth and the coordinate difference between the pixels of the sub-frame for learning and the pixels of the sub-frame for simulation learning to derive a depth map according to the coordinate difference between the pixels of the sub-frame for learning and the pixels of the sub-frame for simulating learning It further comprises the step of generating a learning model.

본 발명은 전술한 바와 같이 서비스서버는 증강현실장치가 촬영한 영상의 스테레오 프레임을 수신하고, 이로부터 3차원 맵을 도출한다. 그리고 본 발명은 증강현실장치는 서비스서버가 도출한 3차원 맵을 이용하여 촬영되는 영상에 가상의 객체를 정합할 수 있다. 본 발명의 3차원 맵은 정밀한 3차원 좌표를 제공하기 때문에 가상의 객체를 정합할 때 정밀한 정합이 가능하다. 이에 따라, 보다 사실감이 높은 증강 현실을 제공할 수 있다. 더욱이, 본 발명의 3차원 맵은 증강현실장치가 직접 도출하는 것이 아니라, 서비스서버에 의해 생성되기 때문에 증강현실장치가 경량화된 경우이더라도 고품질의 서비스를 제공할 수 있다. In the present invention, as described above, the service server receives the stereo frame of the image captured by the augmented reality device, and derives a three-dimensional map therefrom. And according to the present invention, the augmented reality device can match the virtual object to the image taken by using the 3D map derived from the service server. Since the 3D map of the present invention provides precise 3D coordinates, precise registration is possible when registering virtual objects. Accordingly, it is possible to provide augmented reality with higher realism. Moreover, since the 3D map of the present invention is generated by the service server rather than directly derived by the augmented reality device, it is possible to provide a high-quality service even when the augmented reality device is lightweight.

도 1은 본 발명의 실시예에 따른 증강현실 영상을 제공하기 위한 시스템의 구성을 설명하기 위한 도면이다. 1 is a diagram for explaining the configuration of a system for providing an augmented reality image according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 경량형 증강현실장치의 구성을 설명하기 위한 도면이다. 2 is a view for explaining the configuration of a lightweight augmented reality device according to an embodiment of the present invention.

도 3은 본 발명의 실시예에 따른 가상현실 영상을 제공하기 위한 서비스서버의 구성을 설명하기 위한 도면이다. 3 is a diagram for explaining the configuration of a service server for providing a virtual reality image according to an embodiment of the present invention.

도 4는 본 발명의 실시예에 따른 제어부의 세부 구성을 설명하기 위한 도면이다. 4 is a view for explaining a detailed configuration of a control unit according to an embodiment of the present invention.

도 5 및 도 6은 본 발명의 실시예에 따른 포즈 매트릭스를 도출하는 방법을 설명하기 위한 도면이다. 5 and 6 are diagrams for explaining a method of deriving a pose matrix according to an embodiment of the present invention.

도 7은 본 발명의 실시예에 따른 포즈 매트릭스 및 카메라 매트릭스를 이용하여 변환 매트릭스를 도출하는 방법을 설명하기 위한 도면이다. 7 is a diagram for explaining a method of deriving a transformation matrix using a pose matrix and a camera matrix according to an embodiment of the present invention.

도 8은 본 발명의 실시예에 따른 변환 매트릭스를 이용하여 모사 프레임을 생성하는 방법을 설명하기 위한 도면이다. 8 is a diagram for explaining a method of generating a simulated frame using a transform matrix according to an embodiment of the present invention.

도 9는 본 발명의 실시예에 따른 2개의 프레임의 픽셀의 차이를 상관관계를 학습하는 방법을 설명하기 위한 도면이다. 9 is a diagram for explaining a method for learning a correlation between pixels of two frames according to an embodiment of the present invention.

도 10은 본 발명의 실시예에 따른 경량형 증강현실장치의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 방법을 설명하기 위한 흐름도이다. 10 is a flowchart illustrating a method for generating a 3D map using a stereo frame of a lightweight augmented reality device according to an embodiment of the present invention.

본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 실시예에 불과할 뿐, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. Prior to the detailed description of the present invention, the terms or words used in the present specification and claims described below should not be construed as being limited to their ordinary or dictionary meanings, and the inventors should develop their own inventions in the best way. It should be interpreted as meaning and concept consistent with the technical idea of the present invention based on the principle that it can be appropriately defined as a concept of a term for explanation. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all the technical spirit of the present invention, so various equivalents that can be substituted for them at the time of the present application It should be understood that there may be water and variations.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 이때, 첨부된 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음을 유의해야 한다. 또한, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 마찬가지의 이유로 첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this case, it should be noted that in the accompanying drawings, the same components are denoted by the same reference numerals as much as possible. In addition, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted. For the same reason, some components are exaggerated, omitted, or schematically illustrated in the accompanying drawings, and the size of each component does not fully reflect the actual size.

먼저, 본 발명의 실시예에 따른 증강현실(augmented reality: AR) 영상을 제공하기 위한 시스템에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 증강현실 영상을 제공하기 위한 시스템의 구성을 설명하기 위한 도면이다. First, a system for providing an augmented reality (AR) image according to an embodiment of the present invention will be described. 1 is a diagram for explaining the configuration of a system for providing an augmented reality image according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 가상현실 영상을 제공하기 위한 시스템은 증강현실장치(10) 및 서비스서버(20)를 포함한다. 증강현실장치(10)와 서비스서버(20)는 무선 통신을 통해 연결된다. Referring to FIG. 1 , a system for providing a virtual reality image according to an embodiment of the present invention includes an augmented reality device 10 and a service server 20 . The augmented reality device 10 and the service server 20 are connected through wireless communication.

증강현실장치(10)는 사용자에게 증강현실 영상을 제공할 수 있는 모든 장치가 될 수 있다. 대표적으로, 증강현실장치(10)는 스마트 글라스를 예시할 수 있다. The augmented reality device 10 may be any device capable of providing an augmented reality image to a user. Representatively, the augmented reality device 10 may be exemplified by smart glasses.

서비스서버(20)는 증강현실장치(10)와 가장 가까운 위치에 있는 에지 클라우드 서버(Edge Cloud Server)이거나, 증강현실장치(10)과 Wi-Fi 등으로 연결된 고성능의 PC 등이 될 수 있다. The service server 20 may be an edge cloud server located closest to the augmented reality device 10 or a high-performance PC connected to the augmented reality device 10 through Wi-Fi, etc.

본 발명에 따르면, 서비스서버(20)는 증강현실장치(10)가 촬영한 영상의 스테레오 프레임을 수신하여 스테레오 프레임으로부터 3차원 맵을 생성하고, 생성된 3차원 맵을 증강현실장치(10)에 제공한다. 이와 같이, 본 발명에 따르면, 서비스서버(20)의 자원, 즉, 컴퓨팅 연산을 이용하여 3차원 맵을 생성함으로써 증강현실장치(10)는 고성능이 요구되지 않는다. 따라서 증강현실장치(10)는 경량화 될 수 있다. According to the present invention, the service server 20 receives the stereo frame of the image captured by the augmented reality device 10 , generates a 3D map from the stereo frame, and transmits the generated 3D map to the augmented reality device 10 . to provide. As described above, according to the present invention, the augmented reality device 10 does not require high performance by generating a three-dimensional map using the resources of the service server 20, that is, computing operation. Therefore, the augmented reality device 10 can be lightweight.

그러면, 본 발명의 실시예에 따른 증강현실장치(10)에 대해서 설명하기로 한다. 도 2는 본 발명의 실시예에 따른 경량형 증강현실장치의 구성을 설명하기 위한 도면이다. Then, the augmented reality device 10 according to an embodiment of the present invention will be described. 2 is a view for explaining the configuration of a lightweight augmented reality device according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시예에 따른 증강현실장치(10)는 카메라부(11), 통신부(12), 센서부(13), 오디오부(14), 입력부(15), 표시부(16), 저장부(17) 및 제어부(18)를 포함한다. Referring to FIG. 2 , the augmented reality device 10 according to an embodiment of the present invention includes a camera unit 11 , a communication unit 12 , a sensor unit 13 , an audio unit 14 , an input unit 15 , and a display unit ( 16), a storage unit 17 and a control unit 18 .

카메라부(11)는 영상을 촬영하기 위한 것이다. 특히, 본 발명의 실시예에 따른 카메라부(11)는 스테레오 카메라가 될 수 있다. 이에 따라, 카메라부(11)가 촬영된 영상은 주 프레임과 부 프레임을 포함한다. 이를 위하여, 카메라부(12)는 주 프레임 및 부 프레임 각각에 대응하는 2개의 렌즈 및 2개의 이미지 센서를 포함할 수 있다. 각 이미지 센서는 피사체에서 반사되는 빛을 입력받아 전기신호로 변환하며, CCD(Charged Coupled Device), CMOS(Complementary Metal-Oxide Semiconductor) 등을 기반으로 구현될 수 있다. 카메라부(11)는 하나 이상의 아날로그-디지털 변환기(Analog to Digital Converter)를 더 포함할 수 있으며, 이미지 센서에서 출력되는 전기신호를 디지털 수열로 변환하여 제어부(18)로 출력할 수 있다. The camera unit 11 is for capturing an image. In particular, the camera unit 11 according to an embodiment of the present invention may be a stereo camera. Accordingly, the image captured by the camera unit 11 includes a main frame and a sub frame. To this end, the camera unit 12 may include two lenses and two image sensors corresponding to each of the main frame and the sub-frame. Each image sensor receives light reflected from a subject and converts it into an electrical signal, and may be implemented based on a Charged Coupled Device (CCD), a Complementary Metal-Oxide Semiconductor (CMOS), or the like. The camera unit 11 may further include one or more analog-to-digital converters, and may convert an electrical signal output from the image sensor into a digital sequence and output it to the control unit 18 .

통신부(12)는 서비스서버(20)와 통신을 위한 것이다. 통신부(12)는 송신되는 신호의 주파수를 상승 변환 및 증폭하는 RF(Radio Frequency) 송신기(Tx) 및 수신되는 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF 수신기(Rx)를 포함할 수 있다. 그리고 통신부(12)는 송신되는 신호를 변조하고, 수신되는 신호를 복조하는 모뎀(Modem)을 포함할 수 있다. The communication unit 12 is for communication with the service server 20 . The communication unit 12 may include a radio frequency (RF) transmitter (Tx) that up-converts and amplifies the frequency of the transmitted signal, and an RF receiver (Rx) that low-noise amplifies the received signal and down-converts the frequency. In addition, the communication unit 12 may include a modem that modulates a transmitted signal and demodulates a received signal.

센서부(13)는 관성을 측정하기 위한 것이다. 이러한 센서부(13)는 관성센서(Inertial Measurement Unit: IMU), 도플러속도센서(Doppler Velocity Log: DVL) 및 자세방위각센서(Attitude and Heading Reference. System: AHRS) 등을 포함한다. 센서부(13)는 증강현실장치(10)의 회전 및 이동의 위치 및 속도를 포함하는 관성 정보를 측정하여 측정된 증강현실장치(10)의 관성 정보를 제어부(18)로 제공한다. The sensor unit 13 is for measuring inertia. The sensor unit 13 includes an Inertial Measurement Unit (IMU), a Doppler Velocity Log (DVL), an Attitude and Heading Reference System (AHRS), and the like. The sensor unit 13 measures inertial information including the position and speed of rotation and movement of the augmented reality device 10 and provides the measured inertial information of the augmented reality device 10 to the control unit 18 .

오디오부(14)는 오디오 신호를 출력하기 위한 스피커(SPK)와, 오디오 신호를 입력받기 위한 마이크(MIKE)를 포함한다. 오디오부(14)는 제어부(18)의 제어에 따라 오디오 신호를 스피커(SPK)를 통해 출력하거나, 마이크(MIKE)를 통해 입력되는 오디오 신호를 제어부(18)로 전달할 수 있다. 특히, 오디오부(14)는 가상현실 영상의 오디오 신호를 출력하는 역할을 수행한다. The audio unit 14 includes a speaker SPK for outputting an audio signal and a microphone MIKE for receiving an audio signal. The audio unit 14 may output an audio signal through the speaker SPK or transmit an audio signal input through the microphone MIKE to the control unit 18 under the control of the control unit 18 . In particular, the audio unit 14 serves to output an audio signal of the virtual reality image.

입력부(15)는 증강현실장치(10)를 제어하기 위한 사용자의 키 조작을 입력받고 입력 신호를 생성하여 제어부(18)에 전달한다. 입력부(15)는 증강현실장치(10)를 제어하기 위한 각 종 키들을 포함할 수 있다. 입력부(15)는 표시부(16)가 터치스크린으로 이루어진 경우, 각 종 키들의 기능이 표시부(16)에서 이루어질 수 있으며, 터치스크린만으로 모든 기능을 수행할 수 있는 경우, 입력부(15)는 생략될 수도 있다. The input unit 15 receives a user's key manipulation for controlling the augmented reality device 10 , generates an input signal, and transmits it to the control unit 18 . The input unit 15 may include various types of keys for controlling the augmented reality device 10 . In the input unit 15, when the display unit 16 is formed of a touch screen, the functions of various keys can be performed on the display unit 16, and when all functions can be performed only with the touch screen, the input unit 15 may be omitted. may be

표시부(16)는 증강현실장치(10)의 메뉴, 입력된 데이터, 기능 설정 정보 및 기타 다양한 정보를 사용자에게 시각적으로 제공한다. 표시부(16)는 증강현실장치(10)의 부팅 화면, 대기 화면, 메뉴 화면, 등의 화면을 출력하는 기능을 수행한다. 특히, 표시부(16)는 본 발명의 실시예에 따른 3차원 맵을 화면으로 출력하는 기능을 수행한다. 이러한 표시부(16)는 액정표시장치(LCD, Liquid Crystal Display), 유기 발광 다이오드(OLED, Organic Light Emitting Diodes), 능동형 유기 발광 다이오드(AMOLED, Active Matrix Organic Light Emitting Diodes) 등으로 형성될 수 있다. 한편, 표시부(16)는 터치스크린으로 구현될 수 있다. 이러한 경우, 표시부(16)는 터치센서를 포함한다. 터치센서는 사용자의 터치 입력을 감지한다. 터치센서는 정전용량 방식(capacitive overlay), 압력식, 저항막 방식(resistive overlay), 적외선 감지 방식(infrared beam) 등의 터치 감지 센서로 구성되거나, 압력 감지 센서(pressure sensor)로 구성될 수도 있다. 상기 센서들 이외에도 물체의 접촉 또는 압력을 감지할 수 있는 모든 종류의 센서 기기가 본 발명의 터치센서로 이용될 수 있다. 터치센서는 사용자의 터치 입력을 감지하고, 터치된 위치를 나타내는 입력 좌표를 포함하는 감지 신호를 발생시켜 제어부(18)로 전송할 수 있다. 특히, 표시부(16)가 터치스크린으로 이루어진 경우, 입력부(15)의 기능의 일부 또는 전부는 표시부(16)를 통해 이루어질 수 있다. The display unit 16 visually provides a menu of the augmented reality device 10, input data, function setting information, and various other information to the user. The display unit 16 performs a function of outputting a boot screen, a standby screen, a menu screen, and the like of the augmented reality device 10 . In particular, the display unit 16 performs a function of outputting a 3D map according to an embodiment of the present invention to the screen. The display unit 16 may be formed of a liquid crystal display (LCD), an organic light emitting diode (OLED), an active matrix organic light emitting diode (AMOLED), or the like. Meanwhile, the display unit 16 may be implemented as a touch screen. In this case, the display unit 16 includes a touch sensor. The touch sensor detects a user's touch input. The touch sensor may be composed of a touch sensing sensor such as a capacitive overlay, a pressure type, a resistive overlay, or an infrared beam, or may be composed of a pressure sensor. . In addition to the above sensors, all types of sensor devices capable of sensing contact or pressure of an object may be used as the touch sensor of the present invention. The touch sensor may detect a user's touch input, generate a detection signal including input coordinates indicating the touched position, and transmit it to the controller 18 . In particular, when the display unit 16 is formed of a touch screen, some or all of the functions of the input unit 15 may be performed through the display unit 16 .

저장부(17)는 증강현실장치(10)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 특히, 저장부(17)는 카메라 매트릭스, 포즈 매트릭스 등을 저장할 수 있다. 또한, 저장부(17)는 증강현실 영상을 제공하기 위해 실제 영상에 정합하기 위한 가상 객체를 저장할 수 있다. 저장부(17)에 저장되는 각 종 데이터는 증강현실장치(10) 사용자의 조작에 따라, 삭제, 변경, 추가될 수 있다. The storage unit 17 serves to store programs and data necessary for the operation of the augmented reality device 10 . In particular, the storage unit 17 may store a camera matrix, a pose matrix, and the like. Also, the storage unit 17 may store a virtual object to be matched with a real image in order to provide an augmented reality image. Various types of data stored in the storage unit 17 may be deleted, changed, or added according to a user's operation of the augmented reality device 10 .

제어부(18)는 증강현실장치(10)의 전반적인 동작 및 증강현실장치(10)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 또한, 제어부(18)는 기본적으로, 증강현실장치(10)의 각 종 기능을 제어하는 역할을 수행한다. 제어부(18)는 CPU(Central Processing Unit), BP(baseband processor), AP(application processor), GPU(Graphic Processing Unit), DSP(Digital Signal Processor) 등을 예시할 수 있다. The controller 18 may control the overall operation of the augmented reality device 10 and the signal flow between internal blocks of the augmented reality device 10 , and perform a data processing function of processing data. In addition, the control unit 18 basically performs a role of controlling various functions of the augmented reality device (10). The controller 18 may include a central processing unit (CPU), a baseband processor (BP), an application processor (AP), a graphic processing unit (GPU), a digital signal processor (DSP), and the like.

제어부(18)는 카메라부(11)를 통해 영상을 촬영하고 촬영된 영상을 주 프레임 및 부 프레임을 포함하는 스테레오 프레임 단위로 통신부(11)를 통해 서비스서버(20)로 전송한다. 그러면, 서비스서버(20)는 주 프레임 및 부 프레임을 포함하는 복수의 스테레오 프레임으로부터 3차원 맵을 생성하고, 생성된 3차원 맵을 증강현실장치(10)로 전송할 것이다. 이에 따라, 제어부(18)는 통신부(11)를 통해 3차원 맵을 수신하고, 수신된 3차원 맵을 이용하여 촬영된 영상에 가상의 객체를 정합함으로써 증강현실 영상을 제공할 수 있다. The controller 18 takes an image through the camera unit 11 and transmits the captured image to the service server 20 through the communication unit 11 in units of stereo frames including a main frame and a sub frame. Then, the service server 20 will generate a 3D map from a plurality of stereo frames including a main frame and a sub frame, and transmit the generated 3D map to the augmented reality device 10 . Accordingly, the controller 18 may provide the augmented reality image by receiving the 3D map through the communication unit 11 and matching the virtual object to the image captured using the received 3D map.

다음으로, 본 발명의 실시예에 따른 경량형 증강현실장치(10)의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 서비스서버(20)에 대해서 설명하기로 한다. 도 3은 본 발명의 실시예에 따른 가상현실 영상을 제공하기 위한 서비스서버의 구성을 설명하기 위한 도면이다. 도 3을 참조하면, 본 발명의 실시예에 따른 서비스서버(20)는 통신모듈(21), 저장모듈(22) 및 제어모듈(23)을 포함한다. Next, the service server 20 for generating a three-dimensional map using the stereo frame of the lightweight augmented reality device 10 according to an embodiment of the present invention will be described. 3 is a diagram for explaining the configuration of a service server for providing a virtual reality image according to an embodiment of the present invention. Referring to FIG. 3 , the service server 20 according to an embodiment of the present invention includes a communication module 21 , a storage module 22 , and a control module 23 .

통신모듈(21)은 네트워크를 통해 증강현실장치(10)와 통신하기 위한 것이다. 통신모듈(21)은 증강현실장치(10)와 데이터를 송수신 할 수 있다. 통신모듈(21)은 송신되는 신호의 주파수를 상승 변환 및 증폭하는 RF(Radio Frequency) 송신기(Tx) 및 수신되는 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF 수신기(Rx)를 포함할 수 있다. 또한, 통신모듈(21)은 데이터를 송수신하기 위해 송신되는 신호를 변조하고, 수신되는 신호를 복조하는 모뎀(modem)을 포함할 수 있다. 이러한 통신모듈(21)은 제어모듈(23)로부터 전달 받은 데이터, 예컨대, 3차원 맵을 증강현실장치(10)로 전송할 수 있다. 또한, 통신모듈(21)은 증강현실장치(10)로부터 수신되는 데이터, 예컨대, 스테레오 프레임을 제어모듈(23)로 전달할 수 있다. The communication module 21 is for communicating with the augmented reality device 10 through a network. The communication module 21 may transmit/receive data to and from the augmented reality device 10 . The communication module 21 may include an RF (Radio Frequency) transmitter (Tx) for up-converting and amplifying the frequency of a transmitted signal, and an RF receiver (Rx) for low-noise amplifying a received signal and down-converting the frequency. . Also, the communication module 21 may include a modem for modulating a signal to be transmitted and demodulating a signal to be received in order to transmit/receive data. The communication module 21 may transmit data received from the control module 23 , for example, a 3D map to the augmented reality device 10 . In addition, the communication module 21 may transmit data received from the augmented reality device 10 , for example, a stereo frame to the control module 23 .

저장모듈(22)는 서비스서버(20)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 예컨대, 저장모듈(22)는 증강현실장치(10)에 대응하여 3차원 맵을 생성하기 위한 각 종 파라미터, 예컨대, 증강현실장치(10)의 카메라 매트릭스 등을 저장할 수 있다. 저장모듈(121)에 저장되는 각 종 데이터는 서비스서버(20) 관리자의 조작에 따라 등록, 삭제, 변경, 추가될 수 있다. The storage module 22 serves to store programs and data necessary for the operation of the service server 20 . For example, the storage module 22 may store various parameters for generating a 3D map corresponding to the augmented reality device 10 , for example, a camera matrix of the augmented reality device 10 . Various types of data stored in the storage module 121 may be registered, deleted, changed, or added according to the operation of the service server 20 administrator.

제어모듈(23)은 서비스서버(20)의 전반적인 동작 및 서비스서버(20)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 제어모듈(130)은 중앙처리장치(central processing unit), 디지털신호처리기(digital signal processor) 등이 될 수 있다. 또한, 제어모듈(23)은 추가로 이미지 프로세서(Image processor) 혹은 GPU(Graphic Processing Unit)를 더 구비할 수 있다. The control module 23 may control the overall operation of the service server 20 and the signal flow between internal blocks of the service server 20 , and may perform a data processing function of processing data. The control module 130 may be a central processing unit, a digital signal processor, or the like. In addition, the control module 23 may further include an image processor or a graphic processing unit (GPU).

그러면, 전술한 제어모듈(23)의 세부 구성에 대해서 보다 상세하게 설명하기로 한다. 도 4는 본 발명의 실시예에 따른 제어부의 세부 구성을 설명하기 위한 도면이다. 도 5 및 도 6은 본 발명의 실시예에 따른 포즈 매트릭스를 도출하는 방법을 설명하기 위한 도면이다. 도 7은 본 발명의 실시예에 따른 포즈 매트릭스 및 카메라 매트릭스를 이용하여 변환 매트릭스를 도출하는 방법을 설명하기 위한 도면이다. 도 8은 본 발명의 실시예에 따른 변환 매트릭스를 이용하여 모사 프레임을 생성하는 방법을 설명하기 위한 도면이다. 도 9는 본 발명의 실시예에 따른 2개의 프레임의 픽셀의 차이를 상관관계를 학습하는 방법을 설명하기 위한 도면이다. Then, the detailed configuration of the above-described control module 23 will be described in more detail. 4 is a view for explaining a detailed configuration of a control unit according to an embodiment of the present invention. 5 and 6 are diagrams for explaining a method of deriving a pose matrix according to an embodiment of the present invention. 7 is a diagram for explaining a method of deriving a transformation matrix using a pose matrix and a camera matrix according to an embodiment of the present invention. 8 is a diagram for explaining a method of generating a simulated frame using a transform matrix according to an embodiment of the present invention. 9 is a diagram for explaining a method for learning a correlation between pixels of two frames according to an embodiment of the present invention.

도 4를 참조하면, 제어모듈(23)은 포즈도출부(210), 뎁스도출부(220) 및 3차원맵생성부(230)을 포함한다. Referring to FIG. 4 , the control module 23 includes a pose extracting unit 210 , a depth extracting unit 220 , and a 3D map generating unit 230 .

포즈도출부(210)는 증강현실장치(10)의 포즈를 획득하기 위한 것이다. 포즈도출부(210)은 증강현실장치(10)의 카메라부(11)를 통해 촬영된 영상의 스테레오 프레임의 주 프레임 및 부 프레임 각각에서 동일한 대상을 나타내는 특징점(Feature Point)을 추출하고, 추출된 특징점(Feature Point)의 좌표 변화를 산출하여 포즈 정보를 산출한다. The pose derivation unit 210 is for obtaining a pose of the augmented reality device 10 . The pose derivation unit 210 extracts a feature point representing the same object in each of the main frame and the sub frame of the stereo frame of the image captured through the camera unit 11 of the augmented reality device 10, and extracts the extracted Pose information is calculated by calculating a change in coordinates of a feature point.

예컨대, 주 프레임(MF)에 대응하는 제1 포즈(Pose1) 및 부 프레임(SF)에 대응하는 제2 포즈(Pose2)는 도 5에 도시된 바와 같다고 가정한다. 이러한 경우의 주 프레임(MF) 및 부 프레임(SF)의 화면 예를 도 6에 도시하였다. 예컨대, 도 6에서 주 프레임(MF) 및 부 프레임(SF)에서 동일한 대상을 나타내는 특징점 P가 추출되었다고 가정한다. 주 프레임(MF)에서의 특징점 P(t-1)의 좌표와 부 프레임(SF)에서의 특징점 P(t)의 좌표는 차이가 있음을 알 수 있다. 따라서 포즈도출부(210)는 동일한 대상을 나타내는 특징점 P를 추출하고, 추출된 특징점 P의 주 프레임(MF)에서의 특징점 P(m)의 좌표와 부 프레임(SF)에서의 특징점 P(s)의 좌표의 변화를 통해 포즈 정보를 도출할 수 있다. 이와 같이, 포즈도출부(210)는 특징점의 변화를 산출하여 포즈 정보(위치, 회전 정보)를 도출하고, 포즈 정보를 매트릭스로 표현하여 포즈 매트릭스를 도출한다. For example, it is assumed that the first pose Pose1 corresponding to the main frame MF and the second pose Pose2 corresponding to the sub frame SF are as shown in FIG. 5 . In this case, a screen example of the main frame MF and the sub-frame SF is shown in FIG. 6 . For example, in FIG. 6 , it is assumed that a feature point P indicating the same object is extracted from the main frame MF and the sub-frame SF. It can be seen that there is a difference between the coordinates of the feature point P(t-1) in the main frame MF and the coordinates of the feature point P(t) in the sub frame SF. Accordingly, the pose derivation unit 210 extracts a feature point P representing the same object, and the coordinates of the feature point P(m) in the main frame MF of the extracted feature point P and the feature point P(s) in the sub frame SF. Pose information can be derived by changing the coordinates of . In this way, the pose derivation unit 210 derives pose information (position, rotation information) by calculating the change in feature points, and expresses the pose information as a matrix to derive a pose matrix.

뎁스도출부(220)는 심층학습모델(DLM: Deep learning Model)을 이용하여 뎁스 맵(Depth map)을 획득하기 위한 것이다. 뎁스도출부(220)는 학습용 주 프레임(MF) 및 학습용 부 프레임(SF)을 포함하는 학습 데이터를 이용하여 심층학습모델(DLM: Deep learning Model)을 생성하며, 이러한 방법은 다음과 같다. 뎁스도출부(220)는 먼저 도 7에 도시된 바와 같이, 포즈 매트릭스(PM: Pose Matrix) 및 카메라 매트릭스(CM: Camera Matrix)를 이용하여 변환 매트릭스(TM: Transition Matrix)를 도출한다. 여기서, 포즈 매트릭스(PM)는 포즈도출부(210)에 의해 학습용 주 프레임(MF) 및 학습용 부 프레임(SF)으로부터 도출된 포즈 정보를 매트릭스로 표현한 것이다. 또한, 카메라 매트릭스(CM)는 증강현실장치(10)의 카메라부(11)의 내부 파라미터이다. 이러한 카메라 매트릭스(CM)는 증강현실장치(10)로부터 미리 수신되어 저장모듈(220)에 저장된다. 그리고 뎁스도출부(220)는 도 8에 도시된 바와 같이, 변환 매트릭스(TM)를 이용하여 학습용 주 프레임(MF)을 변환하여 학습용 부 프레임(SF)을 모사하는 모사 학습용 부 프레임(SF’)을 생성한다. 그리고 뎁스도출부(220)는 도 9에 도시된 바와 같이, 모델의 원형에 대해 현실 세계에서 동일한 대상을 나타내는 부분의 학습용 부 프레임(SF)의 픽셀과 모사 학습용 부 프레임(SF’)의 픽셀의 좌표 차이와 뎁스 간의 상관관계를 학습(deep learning)시켜 학습용 부 프레임(SF)의 픽셀과 모사 학습용 부 프레임(SF’)의 픽셀의 좌표 차이에 따라 뎁스 맵(Depth map)을 도출하는 심층학습모델(DLM)을 생성한다. The depth derivation unit 220 is for obtaining a depth map using a deep learning model (DLM). The depth derivation unit 220 generates a deep learning model (DLM) by using learning data including a main frame for learning (MF) and a sub-frame for learning (SF), and such a method is as follows. As shown in FIG. 7 , the depth derivation unit 220 first derives a transition matrix (TM) using a pose matrix (PM) and a camera matrix (CM). Here, the pose matrix PM is a matrix expression of pose information derived from the main frame MF for learning and the subframe SF for learning by the pose derivation unit 210 . In addition, the camera matrix CM is an internal parameter of the camera unit 11 of the augmented reality device 10 . This camera matrix (CM) is received in advance from the augmented reality device 10 is stored in the storage module (220). And, as shown in FIG. 8, the depth derivation unit 220 transforms the main frame MF for learning using a transformation matrix TM to simulate the subframe SF for learning (SF') create And, as shown in FIG. 9, the depth derivation unit 220 includes a pixel of a sub-frame for learning (SF) and a pixel of a sub-frame (SF') for imitation of a part representing the same object in the real world with respect to the prototype of the model. A deep learning model that learns the correlation between the coordinate difference and the depth (deep learning) and derives a depth map according to the coordinate difference between the pixels of the sub-frame for learning (SF) and the pixels of the sub-frame (SF') for simulation learning (DLM) is created.

전술한 바와 같이, 심층학습모델(DLM)을 생성한 후, 뎁스도출부(220)는 증강현실장치(10)로부터 통신모듈(21)을 통해 수신된 주 프레임(MF)과 부 프레임(SF)이 입력되면, 도 7에 도시된 바와 같이, 포즈 매트릭스(PM) 및 카메라 매트릭스(CM)를 이용하여 변환 매트릭스(TM)를 도출한다. 그런 다음, 뎁스도출부(220)는 도 8에 도시된 바와 같이, 변환 매트릭스(TM)를 이용하여 주 프레임(MF)을 변환하여 부 프레임(SF)을 모사하는 모사 부 프레임(SF’)을 생성한다. 이어서, 뎁스도출부(220)는 부 프레임(SF) 및 모사 부 프레임(SF’)을 심층학습모델(DLM)에 입력한다. 그러면, 심층학습모델(DLM)은 부 프레임(SF)의 픽셀과 모사 부 프레임(SF’)의 픽셀의 좌표 차이에 따라 뎁스 맵(Depth map)을 도출한다. As described above, after generating the deep learning model (DLM), the depth derivation unit 220 receives the main frame MF and the sub frame SF from the augmented reality device 10 through the communication module 21 . When this is input, as shown in FIG. 7 , a transformation matrix TM is derived using the pose matrix PM and the camera matrix CM. Then, as shown in FIG. 8 , the depth derivation unit 220 converts the main frame MF using the transformation matrix TM to obtain a replica sub-frame SF' that mimics the sub-frame SF. create Next, the depth derivation unit 220 inputs the sub-frame SF and the simulated sub-frame SF' to the deep learning model DLM. Then, the deep learning model (DLM) derives a depth map according to the difference in coordinates between the pixels of the sub-frame SF and the pixels of the simulated sub-frame SF'.

3차원맵생성부(230)는 포즈도출부(210)가 획득한 포즈 정보 및 뎁스도출부(220)가 도출한 뎁스 맵을 이용하여 촬영된 영상의 프레임으로부터 3차원 맵을 도출한다. 즉, 포즈 정보로부터 증강현실장치(10)의 주 프레임(MF)과 부 프레임(SF) 간의 위치 및 회전 정보를 알 수 있고, 뎁스 맵을 통해 주 프레임(MF)과 부 프레임(F1) 간의 뎁스를 알 수 있기 때문에 3차원맵생성부(230)는 위치 및 회전 정보와, 뎁스를 이용하여 해당 프레임의 픽셀의 2차원 좌표를 3차원 좌표로 변환하여 3차원 맵을 생성할 수 있다. The 3D map generator 230 derives a 3D map from the frame of the photographed image by using the pose information obtained by the pose derivation unit 210 and the depth map derived by the depth derivation unit 220 . That is, the position and rotation information between the main frame MF and the sub-frame SF of the augmented reality device 10 can be known from the pose information, and the depth between the main frame MF and the sub-frame F1 through the depth map. can be known, the 3D map generator 230 may generate a 3D map by converting the 2D coordinates of the pixels of the corresponding frame into 3D coordinates using the position and rotation information and the depth.

다음으로, 본 발명의 실시예에 따른 경량형 증강현실장치(10)의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 방법에 대해서 설명하기로 한다. 도 10은 본 발명의 실시예에 따른 경량형 증강현실장치의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 방법을 설명하기 위한 흐름도이다. 도 10의 실시예어서, 전술한 바와 같이, 모델의 원형을 학습(deep learning)시켜 복수의 프레임의 픽셀 좌표의 차이에 따라 뎁스 맵을 도출하는 심층학습모델(DLM)이 생성된 상태라고 가정한다. Next, a method for generating a three-dimensional map using a stereo frame of the lightweight augmented reality device 10 according to an embodiment of the present invention will be described. 10 is a flowchart illustrating a method for generating a 3D map using a stereo frame of a lightweight augmented reality device according to an embodiment of the present invention. In the embodiment of FIG. 10 , as described above, it is assumed that a deep learning model (DLM) that derives a depth map according to a difference in pixel coordinates of a plurality of frames by deep learning the model prototype is generated. .

도 10을 참조하면, 통신모듈(21)은 S110 단계에서 증강현실장치(10)로부터 스테레오 영상의 주 프레임 및 부 프레임을 포함하는 복수의 프레임을 수신한다. Referring to FIG. 10 , the communication module 21 receives a plurality of frames including a main frame and a sub frame of a stereo image from the augmented reality device 10 in step S110 .

그러면, 포즈도출부(210)는 S120 단계에서 주 프레임(MF) 및 부 프레임(SF)을 기초로 포즈 정보 및 포즈 매트릭스를 도출한다. 즉, 포즈도출부(210)는 도 5 및 도 6을 참조하면, 주 프레임(MF) 및 부 프레임(SF) 각각에서 동일한 대상을 나타내는 특징점 P를 추출하고, 추출된 특징점 P의 주 프레임(MF)에서의 특징점 P(m)의 좌표와 부 프레임(SF)에서의 특징점 P(s)의 좌표의 변화를 통해 포즈 정보를 도출할 수 있다. 이와 같이, 포즈도출부(210)는 특징점의 변화를 산출하여 포즈 정보(위치, 회전 정보)를 도출하고, 포즈 정보를 매트릭스로 표현하여 포즈 매트릭스를 도출한다. Then, the pose derivation unit 210 derives the pose information and the pose matrix based on the main frame MF and the sub frame SF in step S120 . That is, with reference to FIGS. 5 and 6 , the pose derivation unit 210 extracts a feature point P indicating the same object in each of the main frame MF and the sub frame SF, and the main frame MF of the extracted feature point P. Pose information can be derived through the change of the coordinates of the feature point P(m) in ) and the coordinates of the feature point P(s) in the sub-frame SF. In this way, the pose derivation unit 210 derives pose information (position, rotation information) by calculating the change in feature points, and expresses the pose information as a matrix to derive a pose matrix.

다음으로, 뎁스도출부(220)는 S130 단계에서 포즈 매트릭스 및 심층학습모델(DLM)을 이용하여 주 프레임(MF) 및 부 프레임(SF)으로부터 뎁스 맵을 도출한다. 이러한 S130 단계에 대해 보다 상세하게 설명하면 다음과 같다. Next, the depth derivation unit 220 derives a depth map from the main frame MF and the sub-frame SF using the pose matrix and the deep learning model (DLM) in step S130 . The step S130 will be described in more detail as follows.

뎁스도출부(220)는 우선, 도 7에 도시된 바와 같이, 포즈도출부(210)에 의해 주 프레임(MF) 및 부 프레임(SF)으로부터 도출된 포즈 매트릭스(PM) 및 증강현실장치(10)의 카메라부(11)에 대해 알려진 카메라 매트릭스(CM)를 이용하여 변환 매트릭스(TM)를 도출한다. 그런 다음, 뎁스도출부(220)는 도 8에 도시된 바와 같이, 변환 매트릭스(TM)를 이용하여 주 프레임(MF)을 변환하여 부 프레임(SF)을 모사하는 모사 부 프레임(SF’)을 생성한다. 이어서, 뎁스도출부(220)는 부 프레임(SF) 및 모사 부 프레임(SF’)을 심층학습모델(DLM)에 입력한다. 그러면, 심층학습모델(DLM)은 부 프레임(SF)의 픽셀과 모사 부 프레임(SF’)의 픽셀의 좌표 차이에 따라 뎁스 맵(Depth map)을 도출할 수 있다. The depth derivation unit 220 is first, as shown in FIG. 7 , the pose matrix PM and the augmented reality device 10 derived from the main frame MF and the sub frame SF by the pose derivation unit 210 . ), a transformation matrix TM is derived using the known camera matrix CM for the camera part 11 . Then, as shown in FIG. 8 , the depth derivation unit 220 converts the main frame MF using the transformation matrix TM to obtain a replica sub-frame SF' that mimics the sub-frame SF. create Next, the depth derivation unit 220 inputs the sub-frame SF and the simulated sub-frame SF' to the deep learning model DLM. Then, the deep learning model (DLM) may derive a depth map according to the difference in coordinates between the pixels of the sub-frame SF and the pixels of the simulated sub-frame SF'.

다음으로, 3차원맵생성부(230)는 S140 단계에서 포즈도출부(210)가 도출한 포즈 정보 및 뎁스도출부(220)가 도출한 뎁스 맵을 이용하여 촬영된 영상의 프레임의 3차원 맵을 생성한다. 즉, 포즈 정보로부터 증강현실장치(10)의 주 프레임(MF)과 부 프레임(SF) 간의 위치 및 회전 정보를 알 수 있고, 뎁스 맵을 통해 주 프레임(MF)과 부 프레임(F1) 간의 뎁스를 알 수 있기 때문에 3차원맵생성부(230)은 위치 및 회전 정보와, 뎁스를 이용하여 해당 프레임의 픽셀의 2차원 좌표를 3차원 좌표로 변환할 수 있다. Next, the 3D map generator 230 uses the pose information derived by the pose derivation unit 210 and the depth map derived from the depth derivation unit 220 in step S140 to provide a three-dimensional map of the frame of the photographed image. create That is, the position and rotation information between the main frame MF and the sub-frame SF of the augmented reality device 10 can be known from the pose information, and the depth between the main frame MF and the sub-frame F1 through the depth map. can be known, the 3D map generator 230 may convert the 2D coordinates of the pixels of the corresponding frame into 3D coordinates using the position and rotation information and the depth.

다음으로, 3차원맵생성부(230)는 S150 단계에서 통신모듈(21)을 통해 증강현실장치(10)로 생성된 3차원 맵을 전송한다. 이에 따라, 증강현실장치(10)의 제어부(18)는 통신부(11)를 통해 3차원 맵을 수신할 수 있다. 그러면, 제어부(180)는 S160 단계에서 저장부(17)에 저장된 가상 객체를 불러와서 카메라부(11)를 통해 촬영된 영상에 가상 객체를 정합하되, 수신된 3차원 맵에 따라 가상 객체를 정합하여 증강현실 영상을 생성한다. 그런 다음, 제어부(18)는 S170 단계에서 표시부(16)를 통해 증강현실 영상을 출력할 수 있다. Next, the 3D map generation unit 230 transmits the generated 3D map to the augmented reality device 10 through the communication module 21 in step S150 . Accordingly, the control unit 18 of the augmented reality device 10 may receive the 3D map through the communication unit 11 . Then, the controller 180 calls the virtual object stored in the storage unit 17 in step S160 and matches the virtual object to the image captured by the camera unit 11, but matches the virtual object according to the received 3D map. to create an augmented reality image. Then, the control unit 18 may output the augmented reality image through the display unit 16 in step S170.

본 발명은 전술한 바와 같이 도출된 3차원 맵을 이용하여 촬영되는 영상에 가상의 객체를 정합할 수 있다. 본 발명의 3차원 맵은 정밀한 3차원 좌표를 제공하기 때문에 가상의 객체를 정합할 때 정밀한 정합이 가능하다. 이에 따라, 보다 사실감이 높은 증강 현실을 제공할 수 있다. 더욱이, 본 발명의 3차원 맵은 증강현실장치(10)가 직접 도출하는 것이 아니라, 서비스서버(20)에 의해 생성되기 때문에 증강현실장치(10)는 경량화 될 수 있다. According to the present invention, a virtual object can be registered with an image captured by using the 3D map derived as described above. Since the 3D map of the present invention provides precise 3D coordinates, precise registration is possible when registering virtual objects. Accordingly, it is possible to provide augmented reality with higher realism. Moreover, since the three-dimensional map of the present invention is generated by the service server 20, not directly derived by the augmented reality device 10, the augmented reality device 10 can be reduced in weight.

한편, 앞서 설명된 본 발명의 실시예에 따른 방법은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Meanwhile, the method according to the embodiment of the present invention described above may be implemented in the form of a program readable by various computer means and recorded in a computer readable recording medium. Here, the recording medium may include a program command, a data file, a data structure, etc. alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks ( magneto-optical media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions may include high-level languages that can be executed by a computer using an interpreter or the like as well as machine language such as generated by a compiler. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다. Although the present invention has been described above using several preferred embodiments, these examples are illustrative and not restrictive. As such, those of ordinary skill in the art to which the present invention pertains will understand that various changes and modifications can be made in accordance with the doctrine of equivalents without departing from the spirit of the present invention and the scope of rights set forth in the appended claims.

Claims (8)

스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 서비스서버에 있어서, In the service server for generating a 3D map using a stereo frame, 증강현실장치로부터 촬영된 영상의 주 프레임 및 부 프레임을 포함하는 복수의 프레임을 지속적으로 수신하는 통신모듈;a communication module for continuously receiving a plurality of frames including a main frame and a sub-frame of an image taken from the augmented reality device; 상기 주 프레임 및 상기 부 프레임으로부터 포즈 정보를 획득하는 포즈도출부; a pose derivation unit for obtaining pose information from the main frame and the sub-frame; 심층학습모델을 이용하여 상기 주 프레임 및 상기 부 프레임으로부터 뎁스 맵을 도출하는 뎁스맵산출부; 및 a depth map calculation unit for deriving a depth map from the main frame and the sub frame using a deep learning model; and 상기 포즈 정보 및 상기 뎁스 맵을 기초로 3차원 맵을 생성하고, 생성된 3차원 맵을 상기 통신모듈을 통해 상기 증강현실장치로 전송하는 3차원맵생성부;a 3D map generator for generating a 3D map based on the pose information and the depth map, and transmitting the generated 3D map to the augmented reality device through the communication module; 를 포함하는 것을 특징으로 하는 characterized in that it comprises 3차원 맵을 생성하기 위한 서비스서버. A service server for creating 3D maps. 제1항에 있어서, According to claim 1, 상기 포즈도출부는 The pose derivation unit 상기 주 프레임 및 상기 부 프레임에서 동일한 대상을 나타내는 특징점을 추출하고, extracting feature points representing the same object from the main frame and the sub-frame; 상기 주 프레임에서의 특징점의 좌표 및 상기 부 프레임에서의 특징점의 좌표 사이의 변화를 통해 포즈 정보 및 포즈 매트릭스를 순차로 도출하는 것을 특징으로 하는 Pose information and pose matrix are sequentially derived through a change between the coordinates of the feature point in the main frame and the coordinates of the feature point in the sub frame. 3차원 맵을 생성하기 위한 서비스서버. A service server for creating 3D maps. 제2항에 있어서, 3. The method of claim 2, 상기 뎁스도출부는 The depth derivation part 상기 증강현실장치의 알려진 카메라 매트릭스 및 상기 포즈 매트릭스를 이용하여 변환 매트릭스를 도출하고, Deriving a transformation matrix using the known camera matrix and the pose matrix of the augmented reality device, 상기 변환 매트릭스를 이용하여 상기 주 프레임으로부터 상기 부 프레임을 모사하는 모사 부 프레임을 생성하고, generating an imitation sub-frame simulating the sub-frame from the main frame using the transformation matrix; 상기 심층학습모델을 통해 상기 부 프레임의 픽셀과 모사 부 프레임의 픽셀의 좌표 차이에 따라 뎁스 맵을 도출하는 것을 특징으로 하는 Through the deep learning model, characterized in that the depth map is derived according to the coordinate difference between the pixel of the sub-frame and the pixel of the imitation sub-frame 3차원 맵을 생성하기 위한 장치. A device for generating three-dimensional maps. 제1항에 있어서, According to claim 1, 상기 뎁스도출부는 The depth derivation part 학습용 주 프레임 및 학습용 부 프레임으로부터 도출된 포즈 매트릭스 및 알려진 카메라 매트릭스를 이용하여 변환 매트릭스를 도출하고, Derive a transformation matrix using the known camera matrix and the pose matrix derived from the training main frame and the training subframe, 상기 변환 매트릭스를 이용하여 상기 학습용 주 프레임으로부터 상기 학습용 부 프레임을 모사하는 모사 학습용 부 프레임을 생성하고, generating a sub-frame for imitation learning that simulates the sub-frame for training from the main frame for training using the transformation matrix; 모델의 원형에 대해 현실 세계에서 동일한 대상을 나타내는 부분의 상기 학습용 부 프레임의 픽셀과 상기 모사 학습용 부 프레임의 픽셀의 좌표 차이와 뎁스 간의 상관관계를 학습시켜 상기 학습용 부 프레임의 픽셀과 상기 모사 학습용 부 프레임의 픽셀의 좌표 차이에 따라 뎁스 맵을 도출하는 심층학습모델을 생성하는 것을 특징으로 하는 For the prototype of the model, by learning the correlation between the coordinate difference and the depth of the pixel of the sub-frame for learning of the part representing the same object in the real world and the pixel of the sub-frame for imitation learning, the pixel of the sub-frame for training and the sub-frame for imitation learning Characterized in generating a deep learning model that derives a depth map according to the difference in coordinates of pixels of a frame 3차원 맵을 생성하기 위한 장치. A device for generating three-dimensional maps. 서비스서버의 스테레오 프레임을 이용한 3차원 맵을 생성하기 위한 방법에 있어서, In the method for generating a 3D map using the stereo frame of the service server, 포즈도출부가 통신모듈을 통해 증강현실장치로부터 촬영된 영상의 주 프레임 및 부 프레임을 포함하는 복수의 스테레오 프레임을 수신하면, 상기 주 프레임 및 상기 부 프레임으로부터 포즈 정보를 도출하는 단계; deriving pose information from the main frame and the sub-frame when the pose derivation unit receives a plurality of stereo frames including the main frame and the sub-frames of the image taken from the augmented reality device through the communication module; 뎁스맵산출부가 심층학습모델을 이용하여 상기 주 프레임 및 상기 부 프레임으로부터 뎁스 맵을 도출하는 단계; deriving, by a depth map calculation unit, a depth map from the main frame and the sub-frame using a deep learning model; 3차원맵생성부가 상기 포즈 정보 및 상기 뎁스 맵을 기초로 3차원맵을 생성하는 단계; 및 generating, by a 3D map generator, a 3D map based on the pose information and the depth map; and 3차원맵생성부가 상기 생성된 3차원 맵을 상기 통신모듈을 통해 상기 증강현실장치로 전송하는 3차원맵생성부;a three-dimensional map generator for transmitting the three-dimensional map generated by the three-dimensional map generator to the augmented reality device through the communication module; 를 포함하는 것을 특징으로 하는 characterized in that it comprises 3차원 맵을 생성하기 위한 방법. A method for creating a three-dimensional map. 제5항에 있어서, 6. The method of claim 5, 상기 주 프레임 및 상기 부 프레임으로부터 포즈 정보를 도출하는 단계는 The step of deriving pose information from the main frame and the sub-frame comprises: 상기 포즈도출부가 상기 주 프레임 및 상기 부 프레임 각각에서 동일한 대상을 나타내는 특징점을 추출하는 단계; 및 extracting, by the pose extracting unit, a feature point representing the same object in each of the main frame and the sub-frame; and 상기 포즈도출부가 상기 주 프레임에서의 특징점의 좌표 및 상기 부 프레임에서의 특징점의 좌표 사이의 변화를 통해 포즈 정보 및 포즈 매트릭스를 순차로 도출하는 단계;sequentially deriving, by the pose derivation unit, pose information and a pose matrix through a change between the coordinates of the feature point in the main frame and the coordinates of the feature point in the sub frame; 를 포함하는 것을 특징으로 하는 characterized in that it comprises 3차원 맵을 생성하기 위한 방법. A method for creating a three-dimensional map. 제6항에 있어서, 7. The method of claim 6, 상기 포즈 정보 및 상기 뎁스 맵을 기초로 3차원맵을 생성하는 단계는 The step of generating a 3D map based on the pose information and the depth map comprises: 상기 뎁스도출부가 상기 증강현실장치의 알려진 카메라 매트릭스 및 상기 포즈 매트릭스를 이용하여 변환 매트릭스를 도출하는 단계; deriving a transformation matrix by the depth derivation unit using the known camera matrix and the pose matrix of the augmented reality device; 상기 뎁스도출부가 상기 변환 매트릭스를 이용하여 상기 주 프레임으로부터 부 프레임을 모사하는 모사 부 프레임을 생성하는 단계; 및 generating, by the depth derivation unit, a simulating sub-frame that mimics a sub-frame from the main frame by using the transform matrix; and 상기 뎁스도출부가 상기 심층학습모델을 통해 상기 부 프레임의 픽셀과 모사 부 프레임의 픽셀의 좌표 차이에 따라 뎁스 맵을 도출하는 단계;deriving, by the depth derivation unit, a depth map according to the difference in coordinates between the pixels of the sub-frame and the pixels of the simulated sub-frame through the deep learning model; 를 포함하는 것을 특징으로 하는 characterized in that it comprises 3차원 맵을 생성하기 위한 방법. A method for creating a three-dimensional map. 제5항에 있어서, 6. The method of claim 5, 상기 주 프레임 및 상기 부 프레임으로부터 포즈 정보를 도출하는 단계 전, Before the step of deriving the pose information from the main frame and the sub-frame, 상기 뎁스도출부가 학습용 주 프레임 및 학습용 부 프레임으로부터 도출된 포즈 매트릭스 및 알려진 카메라 매트릭스를 이용하여 변환 매트릭스를 도출하는 단계; deriving a transformation matrix by the depth derivation unit using a pose matrix derived from a main frame for learning and a sub-frame for learning and a known camera matrix; 상기 뎁스도출부가 상기 변환 매트릭스를 이용하여 상기 학습용 주 프레임으로부터 상기 학습용 부 프레임을 모사하는 모사 학습용 부 프레임을 생성하는 단계; 및 generating, by the depth derivation unit, a sub-frame for imitation learning by using the transformation matrix to simulate the sub-frame for training from the main frame for training; and 상기 뎁스도출부가 모델의 원형에 대해 현실 세계에서 동일한 대상을 나타내는 부분의 상기 학습용 부 프레임의 픽셀과 상기 모사 학습용 부 프레임의 픽셀의 좌표 차이와 뎁스 간의 상관관계를 학습시켜 상기 학습용 부 프레임의 픽셀과 상기 모사 학습용 부 프레임의 픽셀의 좌표 차이에 따라 뎁스 맵을 도출하는 심층학습모델을 생성하는 단계;The depth derivation unit learns the correlation between the coordinate difference and the depth of the pixel of the sub-frame for learning of the part representing the same object in the real world with respect to the prototype of the model and the pixel of the sub-frame for imitation learning, and the pixel of the sub-frame for learning generating a deep learning model for deriving a depth map according to a difference in coordinates of pixels of the sub-frame for simulation learning; 를 더 포함하는 것을 특징으로 하는 characterized in that it further comprises 3차원 맵을 생성하기 위한 방법. A method for creating a three-dimensional map.
PCT/KR2020/015393 2020-10-15 2020-11-05 Apparatus for generating three-dimensional map by using stereo frames of lightweight augmented reality device, and method therefor Ceased WO2022080554A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020200133111A KR102525583B1 (en) 2020-10-15 2020-10-15 Apparatus for generating 3D map using stereo frame of lightweight augmented reality device and method therefor
KR10-2020-0133111 2020-10-15

Publications (1)

Publication Number Publication Date
WO2022080554A1 true WO2022080554A1 (en) 2022-04-21

Family

ID=81208221

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/015393 Ceased WO2022080554A1 (en) 2020-10-15 2020-11-05 Apparatus for generating three-dimensional map by using stereo frames of lightweight augmented reality device, and method therefor

Country Status (2)

Country Link
KR (1) KR102525583B1 (en)
WO (1) WO2022080554A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015153035A (en) * 2014-02-12 2015-08-24 日本電信電話株式会社 Subject information superimposition apparatus, subject information superimposition method, and program
US20180189974A1 (en) * 2017-05-19 2018-07-05 Taylor Clark Machine learning based model localization system
US20190051056A1 (en) * 2017-08-11 2019-02-14 Sri International Augmenting reality using semantic segmentation
KR20190089791A (en) * 2019-07-11 2019-07-31 엘지전자 주식회사 Apparatus and method for providing 3-dimensional around view
JP6560480B2 (en) * 2017-08-14 2019-08-14 楽天株式会社 Image processing system, image processing method, and program

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102647351B1 (en) * 2017-01-26 2024-03-13 삼성전자주식회사 Modeling method and modeling apparatus using 3d point cloud
KR102347586B1 (en) * 2017-12-28 2022-01-07 엘에스일렉트릭(주) Method for providing augmented reality user interface
KR20220106855A (en) 2018-03-12 2022-07-29 라인플러스 주식회사 Augmented reality provision apparatus and provision method for recognizing context by using neural network, and computer program, stored in medium, for executing same method
CN112534475B (en) * 2018-05-17 2023-01-10 奈安蒂克公司 Self-supervised training of depth estimation systems

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015153035A (en) * 2014-02-12 2015-08-24 日本電信電話株式会社 Subject information superimposition apparatus, subject information superimposition method, and program
US20180189974A1 (en) * 2017-05-19 2018-07-05 Taylor Clark Machine learning based model localization system
US20190051056A1 (en) * 2017-08-11 2019-02-14 Sri International Augmenting reality using semantic segmentation
JP6560480B2 (en) * 2017-08-14 2019-08-14 楽天株式会社 Image processing system, image processing method, and program
KR20190089791A (en) * 2019-07-11 2019-07-31 엘지전자 주식회사 Apparatus and method for providing 3-dimensional around view

Also Published As

Publication number Publication date
KR20220050254A (en) 2022-04-25
KR102525583B1 (en) 2023-04-26

Similar Documents

Publication Publication Date Title
US11978239B2 (en) Target detection method and apparatus, model training method and apparatus, device, and storage medium
US11989350B2 (en) Hand key point recognition model training method, hand key point recognition method and device
US12283087B2 (en) Model training method, media information synthesis method, and related apparatuses
EP2915140B1 (en) Fast initialization for monocular visual slam
CN110852942B (en) Model training method, and media information synthesis method and device
EP4390728A1 (en) Model training method and apparatus, device, medium and program product
KR20210111833A (en) Method and apparatus for acquiring positions of a target, computer device and storage medium
EP4006847A1 (en) Virtual object processing method and apparatus, and storage medium and electronic device
WO2020237611A1 (en) Image processing method and apparatus, control terminal and mobile device
CN115130456B (en) Sentence analysis and matching model training method, sentence analysis and matching model training device, sentence analysis and matching model training equipment and storage medium
KR20220145997A (en) Non-face-to-face real-time education method that uses 360-degree images and HMD, and is conducted within the metaverse space
KR102197615B1 (en) Method of providing augmented reality service and server for the providing augmented reality service
CN109561282A (en) A kind of method and apparatus of the action of ground for rendering auxiliary information
US11385856B2 (en) Synchronizing positioning systems and content sharing between multiple devices
US9483116B2 (en) Method, device, and system for providing sensory information and sense
CN112884906A (en) System and method for realizing multi-person mixed virtual and augmented reality interaction
WO2020253716A1 (en) Image generation method and device
CN117831004B (en) Method, device, equipment and medium for detecting obstacle of formula car
WO2023241642A1 (en) Rendering method and apparatus
KR102299902B1 (en) Apparatus for providing augmented reality and method therefor
WO2022080554A1 (en) Apparatus for generating three-dimensional map by using stereo frames of lightweight augmented reality device, and method therefor
CN111921199A (en) Virtual object state detection method, device, terminal and storage medium
CN108038408A (en) Intelligent-tracking method and device, storage medium and electric terminal
WO2022080553A1 (en) Device and method for constructing 3d map for providing augmented reality on basis of pose information and depth information
CN119579753A (en) A method, device and equipment for rendering three-dimensional scenes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20957799

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20957799

Country of ref document: EP

Kind code of ref document: A1