WO2022095543A1

WO2022095543A1 - Image frame stitching method and apparatus, readable storage medium, and electronic device

Info

Publication number: WO2022095543A1
Application number: PCT/CN2021/113122
Authority: WO
Inventors: 施文博
Original assignee: Beike Technology Co Ltd
Current assignee: Beike Technology Co Ltd
Priority date: 2020-11-04
Filing date: 2021-08-17
Publication date: 2022-05-12
Anticipated expiration: 2023-05-04
Also published as: CN112399188A

Abstract

Disclosed are an image frame stitching method and apparatus, an electronic device, and a storage medium. The method comprises: acquiring a preview video stream captured by means of moving a panoramic photography device in a preset space; in response to multiple photography instructions received when moving the panoramic photography device, acquiring images of multiple locations in the preset space by means of the panoramic photography device, so as to obtain multiple scene images; estimating corresponding pose information of the multiple scene images on the basis of the preview video stream; and stitching the multiple scene images on the basis of the corresponding pose information of the multiple scene images, so as to obtain a panoramic image of the preset space.

Description

Image frame stitching method and device, readable storage medium and electronic device

technical field

本公开涉及计算机视觉技术，尤其涉及一种图像帧拼接方法和装置、计算机可读存储介质、电子设备及计算机程序产品。The present disclosure relates to computer vision technology, and in particular, to an image frame stitching method and apparatus, a computer-readable storage medium, an electronic device, and a computer program product.

Background technique

随着终端在人们生活中的普及和应用，用户可以采用终端进行全景图像的拍摄。相关技术中的全景图像是基于拼接多幅图像以达到广角的效果，来展现更多的场景。但对于一些特殊场景，如建筑物中的重复性纹理、墙壁的遮挡、以及相似的空间等，经常会出现错误拼接的问题。With the popularization and application of terminals in people's lives, users can use the terminal to shoot panoramic images. The panoramic image in the related art is based on splicing multiple images to achieve a wide-angle effect to show more scenes. However, for some special scenes, such as repetitive textures in buildings, occlusion of walls, and similar spaces, the problem of wrong stitching often occurs.

发明内容SUMMARY OF THE INVENTION

根据本公开实施例的一个方面，提供了一种图像帧拼接方法，包括：获取通过在设定空间中移动全景拍摄设备而拍摄的预览视频流；响应于在移动所述全景拍摄设备的过程中接收到的多个拍摄指令，通过所述全景拍摄设备获取所述设定空间中的多个位置的图像以得到多帧场景图像；基于所述预览视频流估计所述多帧场景图像的相应位姿信息；基于所述多帧场景图像的相应位姿信息，对所述多帧场景图像进行拼接以得到所述设定空间的全景图像。According to an aspect of the embodiments of the present disclosure, there is provided an image frame stitching method, comprising: acquiring a preview video stream captured by moving a panoramic photographing device in a set space; in response to receiving a video stream in the process of moving the panoramic photographing device The obtained multiple shooting instructions, obtain images of multiple positions in the set space through the panoramic shooting device to obtain multiple frames of scene images; estimate the corresponding poses of the multiple frames of scene images based on the preview video stream information; based on the corresponding pose information of the multi-frame scene images, stitching the multi-frame scene images to obtain a panoramic image of the set space.

根据本公开实施例的另一个方面，提供了一种图像帧拼接装置，包括用于实现上述图像帧拼接方法的装置。According to another aspect of the embodiments of the present disclosure, there is provided an image frame splicing apparatus, including a device for implementing the above-mentioned image frame splicing method.

根据本公开实施例的另一个方面，提供了一种计算机可读存储介质，计算机可读存储介质存储有计算机程序，计算机程序用于执行上述图像帧拼接方法。According to another aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, where the computer-readable storage medium stores a computer program, and the computer program is used to execute the above-mentioned image frame stitching method.

根据本公开实施例的另一个方面，提供了一种电子设备，电子设备包括：处理器；用于存储处理器可执行指令的存储器；处理器，用于从存储器中读取可执行指令，并执行指令以实现上述图像帧拼接方法。According to another aspect of the embodiments of the present disclosure, an electronic device is provided, the electronic device includes: a processor; a memory for storing instructions executable by the processor; a processor for reading the executable instructions from the memory, and The instructions are executed to implement the above image frame stitching method.

根据本公开实施例的另一个方面，提供了一种计算机程序产品，包括计算机程序，其中，所述计算机程序在被处理器执行时实现上述图像帧拼接方法。According to another aspect of the embodiments of the present disclosure, there is provided a computer program product including a computer program, wherein the computer program implements the above-mentioned image frame stitching method when executed by a processor.

下面通过附图和实施例，对本公开的技术方案做进一步的详细描述。The technical solutions of the present disclosure will be further described in detail below through the accompanying drawings and embodiments.

Description of drawings

通过结合附图对本公开实施例进行更详细的描述，本公开的上述以及其他目的、特征和优势将变得更加明显。附图用来提供对本公开实施例的进一步理解，并且构成说明书的一部分，与本公开实施例一起用于解释本公开，并不构成对本公开的限制。在附图中，相同的参考标号通常代表相同部件或步骤。The above and other objects, features and advantages of the present disclosure will become more apparent from the more detailed description of the embodiments of the present disclosure in conjunction with the accompanying drawings. The accompanying drawings are used to provide a further understanding of the embodiments of the present disclosure, and constitute a part of the specification, and are used to explain the present disclosure together with the embodiments of the present disclosure, and do not limit the present disclosure. In the drawings, the same reference numbers generally refer to the same components or steps.

图1是根据本公开的一个实施例的图像帧拼接方法的流程图。FIG. 1 is a flowchart of an image frame stitching method according to an embodiment of the present disclosure.

图2是根据本公开的又一个实施例的图像帧拼接方法的流程图。FIG. 2 is a flowchart of an image frame stitching method according to yet another embodiment of the present disclosure.

图3是根据本公开的再一个实施例的图像帧拼接方法的流程图。FIG. 3 is a flowchart of an image frame stitching method according to still another embodiment of the present disclosure.

图4是根据本公开的另一个实施例的图像帧拼接方法的流程图。FIG. 4 is a flowchart of an image frame stitching method according to another embodiment of the present disclosure.

图5是根据本公开的一个实施例的图像帧拼接装置的结构示意图。FIG. 5 is a schematic structural diagram of an image frame stitching apparatus according to an embodiment of the present disclosure.

图6是根据本公开一示例性实施例的电子设备的结构图。FIG. 6 is a structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed ways

下面，将参考附图详细地描述根据本公开的示例实施例。显然，所描述的实施例仅仅是本公开的一部分实施例，而不是本公开的全部实施例，应理解，本公开不受这里描述的示例实施例的限制。Hereinafter, exemplary embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are only some of the embodiments of the present disclosure, not all of the embodiments of the present disclosure, and it should be understood that the present disclosure is not limited by the example embodiments described herein.

应注意到：除非另外具体说明，否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。It should be noted that the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

本领域技术人员可以理解，本公开实施例中的“第一”、“第二”等术语仅用于区别不同步骤、设备或模块等，既不代表任何特定技术含义，也不表示它们之间的必然逻辑顺序。Those skilled in the art can understand that terms such as "first" and "second" in the embodiments of the present disclosure are only used to distinguish different steps, devices, or modules, etc., and neither represent any specific technical meaning, nor represent any difference between them. the necessary logical order of .

还应理解，在本公开实施例中，“多个”可以指两个或两个以上，“至少一个”可以指一个、两个或两个以上。It should also be understood that, in the embodiments of the present disclosure, "a plurality" may refer to two or more, and "at least one" may refer to one, two or more.

还应理解，对于本公开实施例中提及的任一部件、数据或结构，在没有明确限定或者在前后文给出相反启示的情况下，一般可以理解为一个或多个。It should also be understood that any component, data or structure mentioned in the embodiments of the present disclosure can generally be understood as one or more in the case of no explicit definition or contrary indications given in the context.

另外，本公开中术语“和/或”，仅仅是一种描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。另外，本公开中字符“/”，一般表示前后关联对象是一种“或”的关系。In addition, the term "and/or" in the present disclosure is only an association relationship to describe associated objects, indicating that there can be three kinds of relationships, for example, A and/or B, it can mean that A exists alone, and A and B exist at the same time , there are three cases of B alone. In addition, the character "/" in the present disclosure generally indicates that the related objects are an "or" relationship.

还应理解，本公开对各个实施例的描述着重强调各个实施例之间的不同之处，其相同或相似之处可以相互参考，为了简洁，不再一一赘述。It should also be understood that the description of the various embodiments in the present disclosure emphasizes the differences between the various embodiments, and the same or similar points can be referred to each other, and for the sake of brevity, they will not be repeated.

同时，应当明白，为了便于描述，附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。Meanwhile, it should be understood that, for the convenience of description, the dimensions of various parts shown in the accompanying drawings are not drawn in an actual proportional relationship.

以下对至少一个示例性实施例的描述实际上仅仅是说明性的，决不作为对本公开及其应用或使用的任何限制。The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application or uses in any way.

对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论，但在适当情况下，所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but where appropriate, such techniques, methods, and apparatus should be considered part of the specification.

应注意到：相似的标号和字母在下面的附图中表示类似项，因此，一旦某一项在一个附图中被定义，则在随后的附图中不需要对其进行进一步讨论。It should be noted that like numerals and letters refer to like items in the following figures, so once an item is defined in one figure, it does not require further discussion in subsequent figures.

本公开实施例可以应用于终端设备、计算机系统、服务器等电子设备，其可与众多其它通用或专用计算系统环境或配置一起操作。适于与终端设备、计算机系统、服务器等电子设备一起使用的众所周知的终端设备、计算系统、环境和/或配置的例子包括但不限于：个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境，等等。Embodiments of the present disclosure can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general-purpose or special-purpose computing system environments or configurations. Examples of well-known terminal equipment, computing systems, environments and/or configurations suitable for use with terminal equipment, computer systems, servers, etc. electronic equipment include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients computer, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, minicomputer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the foregoing, among others.

终端设备、计算机系统、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常，程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等，它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施，在分布式云计算环境中，任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中，程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。Electronic devices such as terminal devices, computer systems, servers, etc., may be described in the general context of computer system-executable instructions, such as program modules, being executed by the computer system. Generally, program modules may include routines, programs, object programs, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. Computer systems/servers may be implemented in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located on local or remote computing system storage media including storage devices.

图1是根据本公开一示例性实施例的图像帧拼接方法的流程图。本实施例可应用在电子设备上，如图1所示，该图像帧拼接方法包括如下步骤：FIG. 1 is a flowchart of an image frame stitching method according to an exemplary embodiment of the present disclosure. This embodiment can be applied to electronic equipment. As shown in FIG. 1 , the image frame splicing method includes the following steps:

S102，获取通过在设定空间中移动全景拍摄设备而拍摄的预览视频流。S102: Acquire a preview video stream shot by moving the panoramic shooting device in the set space.

设定空间可以是室内的房间，也可以是室外的场所。全景拍摄设备用于表示设有全景拍摄相机和控制器的设备，其中，全景拍摄相机可以是鱼眼全景相机、多镜头全景相机或可以生成全景拍摄效果的移动客户端；控制器可以包括SLAM(即时定位与地图构建)系统。预览视频流用于表示移动全景拍摄设备初始化后生成的连续图像帧数据。The setting space may be an indoor room or an outdoor place. Panoramic shooting device is used to refer to a device provided with a panoramic shooting camera and a controller, wherein the panoramic shooting camera can be a fisheye panoramic camera, a multi-lens panoramic camera, or a mobile client that can generate panoramic shooting effects; the controller can include SLAM ( real-time positioning and map construction) system. The preview video stream is used to represent continuous image frame data generated after the mobile panorama shooting device is initialized.

S104，响应于在移动全景拍摄设备的过程中接收到的多个拍摄指令，通过全景拍摄设备获取设定空间中的多个位置的图像以得到多帧场景图像。S104, in response to the multiple shooting instructions received during the process of moving the panoramic shooting device, acquire images of multiple positions in the set space through the panoramic shooting device to obtain multiple frames of scene images.

本公开实施例还可以通过手机等远程设备实时查看预览视频流，并通过远程设备发送拍摄指令，实现远程控制。In the embodiment of the present disclosure, the preview video stream can be viewed in real time through a remote device such as a mobile phone, and a shooting instruction can be sent through the remote device to realize remote control.

S106，基于预览视频流估计多帧场景图像的相应位姿信息。S106, estimating corresponding pose information of the multi-frame scene images based on the preview video stream.

多帧场景图像的相应位姿信息用于表示与多帧场景图像中的相应场景图像对应的全景拍摄设备的位移和姿态。The corresponding pose information of the multi-frame scene images is used to represent the displacement and posture of the panoramic photographing device corresponding to the corresponding scene images in the multi-frame scene images.

S108，基于多帧场景图像的相应位姿信息，对多帧场景图像进行拼接以得到设定空间的全景图像。S108, based on the corresponding pose information of the multi-frame scene images, stitching the multi-frame scene images to obtain a panoramic image of the set space.

例如，对整套房源A进行全景拍摄。首先通过在房源A的房间A1中移动鱼眼全景相机来获取预览视频流a1。在获取预览视频流a1后，用户利用鱼眼全景相机对房间A1进行连续拍摄来获取房间A1的多个位置的图像，得到房间A1的多帧场景图像。然后，用户继续移动鱼眼全景相机到下一房间A2，按照同样的方式，对房间A2进行拍摄。直到完成对房源A中全部房间的拍摄后，基于全部预览视频流估计房源A的多帧场景图像的相应位姿信息，并根据所述相应位姿信息确定房源A的多帧场景图像之间的相互关系。将相邻的场景图像拼接起来，即可得到房源A的全景图像。For example, take a panoramic shot of the entire source A. First, the preview video stream a1 is obtained by moving the fisheye panoramic camera in the room A1 of the house A. After obtaining the preview video stream a1, the user continuously shoots the room A1 with the fisheye panoramic camera to obtain images of multiple positions of the room A1, and obtains multiple frames of scene images of the room A1. Then, the user continues to move the fisheye panoramic camera to the next room A2, and shoots the room A2 in the same way. After the shooting of all rooms in the house A is completed, the corresponding pose information of the multi-frame scene images of the house A is estimated based on all the preview video streams, and the multi-frame scene images of the house A are determined according to the corresponding pose information. interrelationships between. By stitching together adjacent scene images, a panoramic image of house A can be obtained.

根据本公开实施例的图像帧拼接方法，获取通过在设定空间中移动全景拍摄设备而拍摄的预览视频流；响应于在移动全景拍摄设备的过程中接收到的多个拍摄指令，通过全景拍摄设备获取设定空间中的多个位置的图像以得到多帧场景图像；基于预览视频流估计多帧场景图像的相应位姿信息；基于多帧场景图像的相应位姿信息，对多帧场景图像进行拼接以得到设定空间的全景图像。本公开实施例利用多帧场景图像的相应位姿信息可以有效解决全景图像中的场景图像的错误拼接问题。此外，利用全景拍摄设备的预览视频流还可以对拍摄设定空间的全景拍摄设备的全局姿态进行估计，以获取准确的全景图像。According to the image frame stitching method of the embodiment of the present disclosure, a preview video stream captured by moving a panoramic photographing device in a set space is obtained; in response to a plurality of photographing instructions received in the process of moving the panoramic photographing Obtaining images of multiple positions in the set space to obtain multi-frame scene images; estimating the corresponding pose information of the multi-frame scene images based on the preview video stream; Stitch to get a panoramic image of the set space. The embodiments of the present disclosure can effectively solve the problem of incorrect splicing of scene images in a panoramic image by using the corresponding pose information of multiple frames of scene images. In addition, by using the preview video stream of the panoramic photographing device, the global pose of the panoramic photographing device for photographing the set space can also be estimated, so as to obtain an accurate panoramic image.

在一些实施方式中，步骤S106之前还可以包括：移除预览视频流中的移动目标，以获得移除了移动目标的预览视频流，则步骤S106进一步可以包括：基于移除了移动目标的预览视频流估计多帧场景图像的相应位姿信息。In some embodiments, before step S106, it may further include: removing the moving object in the preview video stream to obtain a preview video stream with the moving object removed, then step S106 may further include: based on the preview from which the moving object is removed The video stream estimates the corresponding pose information for multiple frames of scene images.

图2是根据本公开另一示例性实施例的图像帧拼接方法的流程示意图。上述移除预览视频流中的移动目标，可以包括如下步骤：FIG. 2 is a schematic flowchart of an image frame stitching method according to another exemplary embodiment of the present disclosure. The above-mentioned removal of the moving target in the preview video stream may include the following steps:

S201，对预览视频流中的场景图像进行移动目标检测以确定是否检测到移动目标。S201. Perform moving object detection on a scene image in a preview video stream to determine whether a moving object is detected.

移动目标可以是人或动物。The moving target can be a person or an animal.

S202，响应于检测到移动目标，基于预设的第二神经网络，移除移动目标。S202, in response to detecting the moving target, remove the moving target based on a preset second neural network.

本公开实施例可以通过特征点检测确定是否存在移动目标。预设的第二神经网络用于表示检测移动目标以及移除移动目标的神经网络，例如，SSD(单个深度神经网络模型，Single Shot MultiBox Detector)、Yolo(一眼就能认出你模型，You Only look once)，Deeplab(空洞卷积模型)。The embodiments of the present disclosure can determine whether there is a moving target through feature point detection. The preset second neural network is used to represent the neural network that detects moving objects and removes moving objects, for example, SSD (single deep neural network model, Single Shot MultiBox Detector), Yolo (you can recognize your model at a glance, You Only look once), Deeplab (a hole convolution model).

本公开实施例可以移除场景图像中多余的移动目标，使得图像帧信息更加完整准确。The embodiment of the present disclosure can remove redundant moving objects in the scene image, so that the image frame information is more complete and accurate.

图3是根据本公开另一示例性实施例的图像帧拼接方法的流程示意图，在上述图1所示实施例的基础上，步骤S106具体可以包括如下步骤：FIG. 3 is a schematic flowchart of an image frame stitching method according to another exemplary embodiment of the present disclosure. On the basis of the embodiment shown in FIG. 1 above, step S106 may specifically include the following steps:

S301，基于即时定位与建图算法和回环检测算法，对全景拍摄设备的运动轨迹进行处理，以估计与预览视频流中的场景图像对应的全景拍摄设备的位姿信息。S301 , based on the real-time positioning and mapping algorithm and the loop closure detection algorithm, process the motion trajectory of the panoramic photographing device to estimate the pose information of the panoramic photographing device corresponding to the scene image in the preview video stream.

即时定位与建图(SLAM)算法和回环检测算法被预存在即时定位与建图(SLAM)系统中。即时定位与建图(SLAM)算法的目的是估计全景拍摄设备的运动轨迹中的各个时刻的位姿；回环检测算法的目的是找到当前场景在历史中是否出现过，如果出现过，就可以相应提供一个非常强的约束条件，即把偏离较大的全景拍摄设备轨迹修正到正确的位置上。The Live Localization and Mapping (SLAM) algorithm and the loop closure detection algorithm are pre-stored in the Live Localization and Mapping (SLAM) system. The purpose of the real-time localization and mapping (SLAM) algorithm is to estimate the pose at each moment in the motion trajectory of the panoramic shooting device; the purpose of the loop closure detection algorithm is to find out whether the current scene has appeared in history, and if it has occurred, it can be used accordingly. Provide a very strong constraint, that is, correct the trajectory of the panorama shooting equipment with a large deviation to the correct position.

S302，基于与预览视频流中的场景图像对应的全景拍摄设备的位姿信息，获取多帧场景图像的相应位姿信息。S302 , based on the pose information of the panoramic photographing device corresponding to the scene image in the preview video stream, obtain corresponding pose information of multiple frames of scene images.

由此，本公开实施例利用即时定位与建图算法和回环检测算法可以对各个时刻的全景拍摄设备的位姿信息进行估计，从而实现对场景图像之间的相对位移和相对旋转的估计，以保证各帧场景图像之间的顺畅跳转。Therefore, the embodiments of the present disclosure can use the real-time positioning and mapping algorithm and the loop closure detection algorithm to estimate the pose information of the panoramic shooting device at each moment, so as to realize the estimation of the relative displacement and relative rotation between the scene images, so as to Ensure smooth jumping between scene images of each frame.

在一些实施方式中，步骤S108之前还可以包括如下步骤：获取全景拍摄设备的位姿尺度，则步骤108还可以包括：基于全景拍摄设备的位姿尺度和多帧场景图像的相应位姿信息，对多帧场景图像进行拼接。In some embodiments, before step S108, the following steps may be further included: acquiring the pose scale of the panoramic shooting device, then step 108 may further include: based on the pose scale of the panoramic shooting device and the corresponding pose information of the multi-frame scene images, Stitches multi-frame scene images.

位姿尺度用于表示多帧场景图像中的图上距离与设定空间中对应的实际距离之比。The pose scale is used to represent the ratio of the on-map distance in the multi-frame scene image to the corresponding actual distance in the set space.

在一些实施方式中，上述获取全景拍摄设备的位姿尺度可以包括如下步骤：基于全景拍摄设备与固定参照物之间的实际距离，获取全景拍摄设备的位姿尺度；或基于预设的第一神经网络，对所述预览视频流进行处理以获取全景拍摄设备的位姿尺度。In some embodiments, obtaining the pose scale of the panoramic shooting device may include the following steps: obtaining the pose scale of the panoramic shooting device based on the actual distance between the panoramic shooting device and the fixed reference object; or based on a preset first The neural network processes the preview video stream to obtain the pose scale of the panoramic shooting device.

固定参照物可以是房间的地面或天花板。例如，设定观测点与多帧场景图像中的地面之间的距离为1，安放在三角架上的鱼眼全景相机与地面之间的实际距离为1.5米，则观测点与多帧场景图像中的地面之间的距离与安放在三角架上的鱼眼全景相机与地面之间的实际距离之比为1:1.5；或，通过预设的第一神经网络即获取深度信息的神经网络，对预览视频流进行处理以确定鱼眼全景相机的位姿尺度，例如：将预览视频流数据输入由测试集训练后得到的卷积神经网络模型，即可得到鱼眼全景相机的位姿尺度。A fixed reference can be the floor or ceiling of a room. For example, if the distance between the observation point and the ground in the multi-frame scene image is 1, and the actual distance between the fisheye panoramic camera placed on the tripod and the ground is 1.5 meters, the observation point and the multi-frame scene image The ratio between the distance between the ground and the actual distance between the fisheye panoramic camera placed on the tripod and the ground is 1:1.5; or, through the preset first neural network, that is, the neural network that obtains depth information, The preview video stream is processed to determine the pose scale of the fisheye panoramic camera. For example, by inputting the preview video stream data into the convolutional neural network model trained on the test set, the pose scale of the fisheye panoramic camera can be obtained.

本公开实施例通过全景拍摄设备与固定参照物之间的实际距离或将预览视频流输入预设的第一神经网络的方式，获取全景拍摄设备的位姿尺度，以确定多帧场景图像中的信息与实际场景中信息的距离对应关系。In this embodiment of the present disclosure, the pose scale of the panoramic photographing device is obtained by using the actual distance between the panoramic photographing device and the fixed reference object or by inputting the preview video stream into a preset first neural network, so as to determine the position and orientation of the panoramic photographing device in the multi-frame scene images. The distance correspondence between the information and the information in the actual scene.

图4是根据本公开另一示例性实施例的图像帧拼接方法的流程示意图，在上述图1所示实施例的基础上，步骤S108具体可以包括如下步骤：FIG. 4 is a schematic flowchart of an image frame stitching method according to another exemplary embodiment of the present disclosure. On the basis of the embodiment shown in FIG. 1 above, step S108 may specifically include the following steps:

S401，基于多帧场景图像的相应位姿信息，确定多帧场景图像的拼接顺序。S401: Determine the splicing sequence of the multi-frame scene images based on the corresponding pose information of the multi-frame scene images.

多帧场景图像的拼接顺序用于表示全景拍摄设备对应的位姿连续变化的顺序，即平移坐标的变化和旋转坐标的变化。The splicing sequence of the multi-frame scene images is used to represent the sequence of continuous changes of the pose corresponding to the panoramic photographing device, that is, the change of translation coordinates and the change of rotation coordinates.

S402，基于多帧场景图像的拼接顺序，确定设定空间的全景图像。S402: Determine a panoramic image of the set space based on the splicing sequence of the multiple frames of scene images.

在一些实施方式中，若多帧场景图像中存在具有图像重叠的场景图像，则对图像重叠的部分进行图像融合处理。In some embodiments, if there are scene images with overlapping images in the multiple frames of scene images, image fusion processing is performed on the overlapping part of the images.

例如，基于多帧场景图像的拼接顺序，将相邻图像帧中存在重叠的部分进行融合处理后，按照拼接顺序，将图像帧拼接至一起，得到全景图像。此外，本公开实施例还可以将该全景图像投射至球面、柱面或立方体上，以实现全方位的视图浏览。For example, based on the splicing sequence of multiple frames of scene images, after the overlapping parts in adjacent image frames are fused, the image frames are spliced together according to the splicing sequence to obtain a panoramic image. In addition, the embodiment of the present disclosure can also project the panoramic image onto a spherical surface, a cylindrical surface or a cube, so as to realize all-round view browsing.

本公开实施例利用多帧场景图像的位姿信息，对多帧场景图像进行拼接，有效解决了全景拍摄设备遇到相似空间的不同全景图像时，容易给出错误估计，以至于出现多帧场景图像错误拼接的问题。The embodiments of the present disclosure use the pose information of the multi-frame scene images to stitch the multi-frame scene images, which effectively solves the problem that when the panoramic shooting device encounters different panoramic images in a similar space, it is easy to give a wrong estimate, so that a multi-frame scene occurs. The problem of wrong image stitching.

本公开实施例提供的任一种图像帧拼接方法可以由任意适当的具有数据处理能力的设备执行，包括但不限于：终端设备和服务器等。或者，本公开实施例提供的任一种图像帧拼接方法可以由处理器执行，如处理器通过调用存储器存储的相应指令来执行本公开实施例提及的任一种图像帧拼接方法。下文不再赘述。Any image frame stitching method provided by the embodiments of the present disclosure may be executed by any appropriate device with data processing capabilities, including but not limited to: terminal devices and servers. Alternatively, any image frame stitching method provided by the embodiments of the present disclosure may be executed by a processor, for example, the processor executes any of the image frame stitching methods mentioned in the embodiments of the present disclosure by invoking corresponding instructions stored in the memory. No further description will be given below.

图5是根据本公开一示例性实施例的图像帧拼接装置的结构示意图。该装置可以设置于终端设备、服务器等电子设备中，以执行本公开上述任一实施例的图像帧拼接方法。如图5所示，该装置包括：FIG. 5 is a schematic structural diagram of an image frame stitching apparatus according to an exemplary embodiment of the present disclosure. The apparatus can be set in electronic equipment such as terminal equipment and servers, so as to execute the image frame stitching method of any of the above-mentioned embodiments of the present disclosure. As shown in Figure 5, the device includes:

第一获取模块51，被配置为获取通过在设定空间中移动全景拍摄设备而拍摄的预览视频流；The first obtaining module 51 is configured to obtain a preview video stream captured by moving the panoramic shooting device in the set space;

第一得到模块52，被配置为响应于在移动所述全景拍摄设备的过程中接收到的多个拍摄指令，通过所述全景拍摄设备获取所述设定空间中的多个位置的图像以得到多帧场景图像；The first obtaining module 52 is configured to, in response to multiple shooting instructions received in the process of moving the panoramic shooting device, obtain images of multiple positions in the set space through the panoramic shooting device to obtain Multi-frame scene images;

估计模块53，被配置为基于所述预览视频流估计所述多帧场景图像的相应位姿信息；以及an estimation module 53, configured to estimate the corresponding pose information of the multi-frame scene images based on the preview video stream; and

第二得到模块54，被配置为基于所述多帧场景图像的相应位姿信息，对所述多帧场景图像进行拼接以得到所述设定空间的全景图像。The second obtaining module 54 is configured to stitch the multiple frames of scene images based on the corresponding pose information of the multiple frames of scene images to obtain a panoramic image of the set space.

基于本公开上述实施例提供的图像帧拼接装置，获取通过在设定空间中移动全景拍摄设备而拍摄的预览视频流；响应于在移动全景拍摄设备的过程中接收到的多个拍摄指令，通过全景拍摄设备获取设定空间中的多个位置的图像以得到多帧场景图像；基于预览视频流估计多帧场景图像的相应位姿信息；基于多帧场景图像的相应位姿信息，对多帧场景图像进行拼接以得到设定空间的全景图像。本公开实施例利用多帧场景图像的相应位姿信息图像可以有效解决全景图像中的场景图像的错误拼接问题。此外，利用全景拍摄设备中的预览视频流还可以对拍摄设定空间的全景拍摄设备的全局姿态进行估计，以获取准确的全景图像。Based on the image frame stitching apparatus provided by the above-mentioned embodiments of the present disclosure, a preview video stream captured by moving a panoramic photographing device in a set space is obtained; The shooting device acquires images of multiple positions in the set space to obtain multi-frame scene images; estimates the corresponding pose information of the multi-frame scene images based on the preview video stream; The images are stitched to obtain a panoramic image of the set space. The embodiments of the present disclosure can effectively solve the problem of incorrect splicing of scene images in panoramic images by using corresponding pose information images of multiple frames of scene images. In addition, by using the preview video stream in the panoramic photographing device, the global pose of the panoramic photographing device for photographing the set space can also be estimated, so as to obtain an accurate panoramic image.

在一些实施方式中，所述估计模块53包括：In some embodiments, the estimation module 53 includes:

移除单元，被配置为移除所述预览视频流中的移动目标，以得到移除了移动目标的预览视频流；以及a removing unit configured to remove a moving object in the preview video stream to obtain a preview video stream with the moving object removed; and

第一估计单元，被配置为基于移除了移动目标的预览视频流，估计所述多帧场景图像的相应位姿信息。The first estimation unit is configured to estimate the corresponding pose information of the multi-frame scene images based on the preview video stream from which the moving object is removed.

在一些实施方式中，所述移除单元包括：In some embodiments, the removal unit includes:

第一确定单元，被配置为对所述预览视频流中的场景图像进行移动目标检测以确定是否检测到移动目标；以及a first determining unit configured to perform moving object detection on the scene image in the preview video stream to determine whether a moving object is detected; and

处理单元，被配置为响应于检测到移动目标，基于预设的第二神经网络，移除所述移动目标。The processing unit is configured to, in response to detecting the moving target, remove the moving target based on a preset second neural network.

第二估计单元，被配置为基于即时定位与建图算法和回环检测算法，对所述全景拍摄设备的运动轨迹进行处理，以估计与所述预览视频流中的场景图像对应的全景拍摄设备的位姿信息；以及The second estimation unit is configured to process the motion trajectory of the panoramic photographing device based on the real-time positioning and mapping algorithm and the loop closure detection algorithm, so as to estimate the motion trajectory of the panoramic photographing device corresponding to the scene image in the preview video stream. pose information; and

第一获取单元，被配置为基于与所述预览视频流中的场景图像对应的全景拍摄设备的位姿信息，获取所述多帧场景图像的相应位姿信息。The first obtaining unit is configured to obtain the corresponding pose information of the multi-frame scene images based on the pose information of the panoramic photographing device corresponding to the scene images in the preview video stream.

在一些实施方式中，所述第二得到模块54包括：In some embodiments, the second obtaining module 54 includes:

第二获取单元，被配置为获取所述全景拍摄设备的位姿尺度，其中，所述位姿尺度用于表示所述多帧场景图像中的图上距离与所述设定空间中的对应实际距离之比；以及The second obtaining unit is configured to obtain the pose scale of the panoramic photographing device, wherein the pose scale is used to represent the distance on the map in the multi-frame scene images and the corresponding actual distance in the set space distance ratio; and

拼接单元，被配置为基于所述全景拍摄设备的位姿尺度和所述多帧场景图像的相应位姿信息，对所述多帧场景图像进行拼接以得到所述设定空间的全景图像。The stitching unit is configured to stitch the multiple frames of scene images based on the pose scale of the panoramic photographing device and the corresponding pose information of the multiple frames of scene images to obtain a panoramic image of the set space.

在一些实施方式中，所述第二获取单元被配置为：In some embodiments, the second obtaining unit is configured to:

基于所述全景拍摄设备与固定参照物之间的实际距离，获取所述全景拍摄设备的位姿尺度；或Obtain the pose scale of the panoramic photographing device based on the actual distance between the panoramic photographing device and a fixed reference object; or

基于预设的第一神经网络，对所述预览视频流进行处理以获取所述全景拍摄设备的位姿尺度。Based on the preset first neural network, the preview video stream is processed to obtain the pose scale of the panoramic shooting device.

第二确定单元，被配置为基于所述多帧场景图像的相应位姿信息，确定所述多帧场景图像的拼接顺序；以及a second determining unit configured to determine the splicing sequence of the multi-frame scene images based on the corresponding pose information of the multi-frame scene images; and

第三确定单元，被配置为基于所述多帧场景图像的拼接顺序，确定所述设定空间的全景图像。The third determination unit is configured to determine the panoramic image of the set space based on the splicing sequence of the multiple frames of scene images.

在一些实施方式中，还包括：In some embodiments, it also includes:

融合模块，被配置为响应于确定所述多帧场景图像中的至少一个场景图像存在图像重叠，对所述图像重叠的部分进行图像融合处理。The fusion module is configured to, in response to determining that at least one scene image in the multiple frames of scene images has image overlap, perform image fusion processing on the overlapping portion of the images.

下面，参考图6来描述根据本公开实施例的电子设备。该电子设备可以是第一设备和第二设备中的任一个或两者、或与它们独立的单机设备，该单机设备可以与第一设备和第二设备进行通信，以从它们接收所采集到的输入信号。Hereinafter, an electronic device according to an embodiment of the present disclosure will be described with reference to FIG. 6 . The electronic device may be either or both of the first device and the second device, or a stand-alone device independent of them that can communicate with the first device and the second device to receive the collected data from them input signal.

图6图示了根据本公开实施例的电子设备的框图。6 illustrates a block diagram of an electronic device according to an embodiment of the present disclosure.

如图6所示，电子设备60包括一个或多个处理器61和存储器62。As shown in FIG. 6 , electronic device 60 includes one or more processors 61 and memory 62 .

处理器61可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元，并且可以控制电子设备60中的其他组件以执行期望的功能。Processor 61 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in electronic device 60 to perform desired functions.

存储器62可以包括一个或多个计算机程序产品，所述计算机程序产品可以包括各种形式的计算机可读存储介质，例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令，处理器61可以运行所述程序指令，以实现上文所述的本公开的各个实施例的图像帧拼接方法以及/或者其他期望的功能。在所述计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。Memory 62 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory, or the like. The non-volatile memory may include, for example, read only memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 61 may execute the program instructions to implement the image frame stitching method and/or the various embodiments of the present disclosure described above. Other desired features. Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.

在一个示例中，电子设备60还可以包括：输入装置63和输出装置64，这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。In one example, the electronic device 60 may also include an input device 63 and an output device 64 interconnected by a bus system and/or other form of connection mechanism (not shown).

例如，在该电子设备是第一设备或第二设备时，该输入装置63可以是上述的麦克风或麦克风阵列，用于捕捉声源的输入信号。在该电子设备是单机设备时，该输入装置63可以是通信网络连接器，用于从第一设备和第二设备接收所采集的输入信号。For example, when the electronic device is the first device or the second device, the input device 63 may be the above-mentioned microphone or microphone array for capturing the input signal of the sound source. When the electronic device is a stand-alone device, the input device 63 may be a communication network connector for receiving the collected input signals from the first device and the second device.

此外，该输入设备63还可以包括例如键盘、鼠标等等。In addition, the input device 63 may also include, for example, a keyboard, a mouse, and the like.

该输出装置64可以向外部输出各种信息，包括确定出的距离信息、方向信息等。该输出设备64可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。The output device 64 can output various information to the outside, including the determined distance information, direction information, and the like. The output devices 64 may include, for example, displays, speakers, printers, and communication networks and their connected remote output devices, among others.

当然，为了简化，图6中仅示出了该电子设备60中与本公开有关的组件中的一些，省略了诸如总线、输入/输出接口等等的组件。除此之外，根据具体应用情况，电子设备60还可以包括任何其他适当的组件。Of course, for simplicity, only some of the components in the electronic device 60 related to the present disclosure are shown in FIG. 6 , and components such as buses, input/output interfaces, and the like are omitted. Besides, the electronic device 60 may also include any other suitable components according to the specific application.

除了上述方法和设备以外，本公开的实施例还可以是计算机程序产品，其包括计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的图像帧拼接方法中的步骤。In addition to the methods and apparatus described above, embodiments of the present disclosure may also be computer program products comprising computer program instructions that, when executed by a processor, cause the processor to perform the "exemplary method" described above in this specification The steps in the image frame stitching method according to various embodiments of the present disclosure described in the section.

所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码，所述程序设计语言包括面向对象的程序设计语言，诸如Java、C++等，还包括常规的过程式程序设计语言，诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。The computer program product may write program code for performing operations of embodiments of the present disclosure in any combination of one or more programming languages, including object-oriented programming languages, such as Java, C++, etc. , also includes conventional procedural programming languages, such as "C" language or similar programming languages. The program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on.

此外，本公开的实施例还可以是计算机可读存储介质，其上存储有计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的图像帧拼接方法中的步骤。In addition, embodiments of the present disclosure may also be computer-readable storage media having computer program instructions stored thereon that, when executed by a processor, cause the processor to perform the above-described "Example Method" section of this specification The steps in the image frame stitching method according to various embodiments of the present disclosure described in .

所述计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The computer-readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

以上结合具体实施例描述了本公开的基本原理，但是，需要指出的是，在本公开中提及的优点、优势、效果等仅是示例而非限制，不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外，上述公开的具体细节仅是为了示例的作用和便于理解的作用，而非限制，上述细节并不限制本公开为必须采用上述具体的细节来实现。The basic principles of the present disclosure have been described above with reference to specific embodiments. However, it should be pointed out that the advantages, advantages, effects, etc. mentioned in the present disclosure are only examples rather than limitations, and these advantages, advantages, effects, etc. should not be considered to be A must-have for each embodiment of the present disclosure. In addition, the specific details disclosed above are only for the purpose of example and easy understanding, but not for limitation, and the above details do not limit the present disclosure to be implemented by using the above specific details.

本说明书中各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其它实施例的不同之处，各个实施例之间相同或相似的部分相互参见即可。对于系统实施例而言，由于其与方法实施例基本对应，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。The various embodiments in this specification are described in a progressive manner, and each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments may be referred to each other. As for the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and for related parts, please refer to the partial description of the method embodiment.

本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的，可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇，指“包括但不限于”，且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”，且可与其互换使用，除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”，且可与其互换使用。The block diagrams of devices, apparatus, apparatuses, and systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will appreciate, these means, apparatuses, apparatuses, systems may be connected, arranged, and configured in any manner. Words such as "including", "including", "having" and the like are open-ended words meaning "including but not limited to" and are used interchangeably therewith. As used herein, the words "or" and "and" refer to and are used interchangeably with the word "and/or" unless the context clearly dictates otherwise. As used herein, the word "such as" refers to and is used interchangeably with the phrase "such as but not limited to".

可能以许多方式来实现本公开的方法和装置。例如，可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置。用于所述方法的步骤的上述顺序仅是为了进行说明，本公开的方法的步骤不限于以上具体描述的顺序，除非以其它方式特别说明。此外，在一些实施例中，还可将本公开实施为记录在记录介质中的程序，这些程序包括用于实现根据本公开的方法的机器可读指令。因而，本公开还覆盖存储用于执行根据本公开的方法的程序的记录介质。The methods and apparatus of the present disclosure may be implemented in many ways. For example, the methods and apparatus of the present disclosure may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure can also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

还需要指出的是，在本公开的装置、设备和方法中，各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。It should also be noted that, in the apparatus, device and method of the present disclosure, each component or each step may be decomposed and/or recombined. These disaggregations and/or recombinations should be considered equivalents of the present disclosure.

提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的，并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此，本公开不意图被限制到在此示出的方面，而是按照与在此公开的原理和新颖的特征一致的最宽范围。The above description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the present disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

为了例示和描述的目的已经给出了以上描述。此外，此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例，但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。The foregoing description has been presented for the purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the present disclosure to the forms disclosed herein. Although a number of example aspects and embodiments have been discussed above, those skilled in the art will recognize certain variations, modifications, changes, additions and sub-combinations thereof.

Claims

An image frame splicing method, comprising:

Get the preview video stream shot by moving the panorama shooting device in the set space;

In response to a plurality of shooting instructions received in the process of moving the panoramic shooting device, acquiring images of multiple positions in the set space through the panoramic shooting device to obtain multiple frames of scene images;

Estimating corresponding pose information of the multi-frame scene images based on the preview video stream; and

Based on the corresponding pose information of the multi-frame scene images, the multi-frame scene images are stitched to obtain a panoramic image of the set space.

The method according to claim 1, wherein the estimating the corresponding pose information of the multi-frame scene images based on the preview video stream comprises:

removing the moving object in the preview video stream to obtain a preview video stream with the moving object removed; and

Based on the preview video stream with the moving object removed, corresponding pose information of the multi-frame scene images is estimated.

The method according to claim 2, wherein the removing the moving object in the preview video stream comprises:

performing moving object detection on scene images in the preview video stream to determine whether a moving object is detected; and

In response to detecting the moving object, the moving object is removed based on a preset second neural network.

The method according to any one of claims 1-3, wherein the estimating the corresponding pose information of the multi-frame scene images based on the preview video stream comprises:

processing the motion trajectory of the panoramic shooting device based on the real-time positioning and mapping algorithm and the loop closure detection algorithm to estimate the pose information of the panoramic shooting device corresponding to the scene image in the preview video stream; and

Based on the pose information of the panoramic photographing device corresponding to the scene image in the preview video stream, the corresponding pose information of the multiple frames of scene images is acquired.

The method according to any one of claims 1-4, wherein the multi-frame scene images are spliced based on the corresponding pose information of the multi-frame scene images to obtain a panoramic image of the set space, include:

acquiring the pose scale of the panoramic photographing device, wherein the pose scale is used to represent the ratio of the distance on the map in the multi-frame scene images to the corresponding actual distance in the set space; and

Based on the pose scale of the panoramic photographing device and the corresponding pose information of the multi-frame scene images, the multi-frame scene images are stitched to obtain the panorama image of the set space.

The method according to claim 5, wherein the acquiring the pose scale of the panoramic photographing device comprises:

Obtain the pose scale of the panoramic photographing device based on the actual distance between the panoramic photographing device and a fixed reference object; or

Based on the preset first neural network, the preview video stream is processed to obtain the pose scale of the panoramic shooting device.

determining a splicing sequence of the multi-frame scene images based on the corresponding pose information of the multi-frame scene images; and

Based on the splicing sequence of the multiple frames of scene images, a panoramic image of the set space is determined.

The method according to any one of claims 1-7, further comprising:

In response to determining that there is image overlap in at least one of the multiple frames of scene images, image fusion processing is performed on the overlapped portion of the images.

An image frame splicing device, comprising: a device for implementing the method of any one of claims 1-8.

A computer-readable storage medium, wherein the storage medium stores a computer program for performing the method of any one of claims 1-8.

An electronic device comprising:

processor; and

A memory for storing instructions executable by the processor, wherein the executable instructions, when executed by the processor, implement the method of any of claims 1-8.

A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-8.