Detailed Description
The technical solutions of the present invention will be clearly and completely described below in conjunction with specific embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention is suitable for large-scale visual information acquisition scenes with wide range and clear detail in the central area. For example, in the field of security monitoring, the system is used for real-time monitoring of large-area areas such as airport waiting halls, railway station squares, large malls and the like in large public places, and is used for capturing clear individual characteristics (central lenses play a role) of crowded places and taking account of large-view-field dynamics (fisheye lens charge) such as peripheral channels, entrances and the like. For example, in the intelligent traffic field, the system can be installed at complex intersections, a central lens focuses on signal lamps, vehicle license plates of main lanes and driving state details, and fish eye lenses cover traffic flow and pedestrian conditions of the whole intersection and peripheral branches. In the aspect of virtual reality/augmented reality content acquisition, for example, wide and fine image materials can be provided for creating an immersive scene, and the requirements of panoramic immersion and key element close-up are met. The method can also be used for panoramic mapping of local areas in geographical mapping, and can be used for recording the surrounding topography and topography of urban landmarks, and considering the details of building bodies and the surrounding environment situation.
The multi-lens large-view-field high-resolution imaging method can be integrated in imaging equipment, the imaging equipment can be a professional-level industrial camera integrated device, is designed into an independent equipment form with a protective shell and a stable installation structure, is suitable for long-term stable operation in outdoor severe environments, and can also be a core imaging module embedded into intelligent terminal equipment (such as a high-end smart phone and a professional-level tablet personal computer for on-site investigation recording), an on-board intelligent vision system, an unmanned aerial vehicle aerial photo holder camera assembly and the like.
The multi-lens large-view-field high-resolution imaging method comprises a central lens and four fisheye lenses, wherein the four fisheye lenses are geometrically distributed around the central lens, the angles of view of the central lens and the four fisheye lenses are mutually overlapped, the central lens is used for capturing an image of a central area, the optical resolution of the central lens is higher than that of the fisheye lenses, and the fisheye lenses are used for capturing a large-view-field image of a peripheral area.
Specifically, the center lens is a fixed focus lens with high optical resolution, for example, canon EF 50mm f/1.2LUSM, the optical resolution of the fixed focus lens can clearly capture tiny details of a center area, and textures and color changes of the surface of a cultural relic, a fine operation part of a repairing tool and the like can be accurately presented when a cultural relic repairing scene is shot. The four fish-eye lenses are of a type suitable for capturing a large field of view, such as Nikon AF DX Fisheye-Nikkor10.5mm f/2.8G ED, are uniformly distributed in a circular geometry around the central lens, and the included angles of the optical axes of the adjacent fish-eye lenses are 90 degrees, so that no visual dead angles exist in the horizontal direction and the vertical direction, and the peripheral area is covered in an all-around manner.
A multi-lens large-view-field high-resolution imaging method comprises the following steps of S10-S60.
S10, synchronously acquiring images acquired by the center lens and each fisheye lens.
In this embodiment, the synchronous acquisition of the images acquired by the center lens and each fisheye lens is the basis for subsequent high-quality imaging. By triggering all lenses (namely the central lens and the fisheye lenses) to synchronously expose, the method can be realized through hardware triggering or software synchronization, ensures that all images are acquired at the same time, ensures time synchronism and reads image data from all the lenses.
S20, carrying out distortion correction processing on the images acquired by the fisheye lenses based on camera calibration parameters so as to obtain corrected fisheye images.
In one embodiment, step S20 includes the steps of S201-S203.
S201, acquiring camera calibration parameters of each fish-eye lens.
In the embodiment, a calibration field is constructed, a checkerboard calibration plate is selected, and the edge length precision of the checkerboard is controlled to be +/-0.05 mm so as to ensure the accuracy of a calibration standard. The calibration field is provided with an illumination light source which is uniform and stable and has a constant color temperature (such as 5500K), so that shadows or reflective interference caused by uneven light rays to phase acquisition is avoided, and the calibration precision is influenced. Four fish-eye lenses are sequentially arranged on a holder clamp with a high-precision displacement adjusting function, the holder precision can achieve +/-0.1 degree rotation and +/-0.1 mm translation, the relative positions of the lenses and the calibration plate are convenient to adjust, and clear and complete calibration images are ensured to be acquired under different postures.
For each fish-eye lens, a plurality of groups of different shooting position and angle combinations are set. For example, at 0.5m, 1m, 2m from the calibration plate, taking at least 10 images at each position and angle at a level + -30 deg., + -60 deg., and a vertical + -20 deg., + -40 deg., respectively, ensuring that the overlay lens is in various possible imaging states, for a total of about 60 calibration images are taken per lens.
And carrying out optimization by applying Zhang Zhengyou calibration algorithm and combining the characteristics of the fish-eye lens. For internal reference calculation, parameters such as focal length (for example, the focal length of a certain fisheye lens is finally determined to be 8mm, the error range is +/-0.02 mm), principal point coordinates (the X coordinate of the principal point is accurate to +/-0.5 pixel, and the Y coordinate accuracy is the same), radial distortion coefficients (generally, the second-order and third-order radial distortion coefficients are accurate to +/-0.001) and the like are accurately solved by utilizing the corresponding relation between pixel coordinates and world coordinates of checkerboard corner points in an image and minimizing reprojection errors through multiple iterations. In the aspect of external parameters, imaging coordinate changes under different lens postures through the same world coordinate point based on the obtained internal parameters, calculating a rotation matrix and a translation matrix by adopting a Singular Value Decomposition (SVD) based method, controlling the precision of each element of the rotation matrix to be +/-0.005, and ensuring that the spatial position and posture parameters of the lens relative to a calibration plate are accurate by the precision of the translation matrix to +/-0.1 mm.
S202, correcting the acquired fisheye image by selecting a distortion correction algorithm.
In this embodiment, for radial distortion correction, a second order polynomial model :xd=x(1+k1r2+k2r4),yd=y(1+k1r2+k2r4), is used, where (x, y) is the original image pixel coordinates, (x d,yd) is the corrected coordinates,K 1、k2 is the radial distortion coefficient. Taking a fish-eye lens image as an example k 1=0.25,k2 = -0.08, calculating correction coordinates according to the formula by traversing an original image pixel by pixel, and obtaining corrected pixel values by adopting a bilinear interpolation method under the condition that the coordinates are non-integer. For example, a pixel point coordinate at the edge of the original image is 100.3,200.7, the calculated correction coordinate is 98,202.5, and the four neighborhood pixel values are utilized to perform bilinear interpolation to obtain a corrected accurate pixel value, so that the geometric continuity of the image is ensured. For tangential distortion, the formula is used:
xd=x+[2p1xy+p2(r2+2x2)],yd=y+[p1(r2+2y2)+2p2xy], Wherein p 1、p2 is a tangential distortion coefficient, and the tangential distortion influence is synchronously corrected by combining a radial distortion correction step, so that the image detail, especially the texture at the edge, is ensured to be truly restored. If a round object is shot, the round outline is accurately restored after correction because tangential distortion is elliptical before correction.
The fisheye image is divided into a plurality of concentric annular areas, and different correction strategies are adopted according to the distortion degree differences of the different areas. For example, a low-order polynomial model is adopted in a central area (radius 0-50 pixels), and the distortion is relatively small, so that the image blurring caused by excessive calculation can be effectively corrected and avoided by using a simple model, a high-order polynomial and linear stretching transformation are combined in a middle annular area (radius 51-150 pixels), the pixel distribution uniformity is adjusted while the distortion is corrected, a correction method based on a mapping lookup table (LUT) is introduced in an edge area (radius 151 pixels to an image boundary), a large amount of typical coordinate distortion mapping relations are calculated in advance according to calibration parameters and stored in the LUT, correction coordinates are quickly indexed and obtained during real-time correction, and the edge correction speed and accuracy are improved. If the city panorama is shot, the edge building outline is corrected by the segmentation model, the line is straight, the detail is clear, and obvious saw teeth or stretching deformation is avoided.
S203, generating corrected fish-eye images.
In the correction process, the image is monitored and optimized in real time. And (3) evaluating the definition of the corrected image by utilizing the image gradient information, and if the local area is found to be blurred due to correction (such as the gradient amplitude is lower than a set threshold), automatically adjusting interpolation algorithm parameters or increasing local smooth filtering strength, and recovering image details on the premise of ensuring the accuracy of the geometric shape. For example, in a natural wind-light image shot by a fish-eye lens, the edge of the leaf is corrected to be blurred initially, and the filtering parameters are adaptively adjusted to resharpening the edge of the leaf, so that the texture of the leaf is clear and natural.
The color consistency of the image is maintained, the distortion correction possibly affects the color distribution, the color histogram of the corrected image is compared and matched with the color histogram of the original calibration plate image by adopting a color histogram-based matching technology, the gains of red, green and blue channels are adjusted, the color of the image is ensured not to be distorted, if flowers are shot, and the color vividness of the corrected flowers is consistent with that of a real scene.
For the steps S201-S203, the distortion in the fisheye lens image is effectively eliminated through accurate calibration parameters and an adaptation algorithm, the fisheye image distortion is accurately corrected, the linear object imaging is restored to be straight, and the outline of the circular object is free from deformation.
S30, matching the motion characteristics of the corrected fisheye image and the central lens image.
In one embodiment, step S30 includes the steps S301-S304.
S301, detecting characteristic points of the corrected fisheye image and the corrected central lens image by using a characteristic detector.
In the present embodiment, harris angle point detectors are selected for the center lens image for feature point detection, and FAST detectors are selected for the eye image for feature point detection.
S302, calculating feature descriptors of surrounding pixels of each detected feature point, wherein resolution related information among pixels is enhanced when the center lens image feature descriptors are calculated, and view field related information is enhanced when the fisheye image feature descriptors are calculated.
In the embodiment, the mode of strengthening resolution information by using the central lens image feature descriptor is that a 3×3 neighborhood window is selected by taking a feature point as the center, the window is subdivided into 9 sub-areas, gradient amplitude histograms of 8 directions (0 ° -360 ° are equally divided into 8 intervals) are respectively counted in each sub-area, and 9×8=72-dimensional feature vectors are formed by combination. When the gradient amplitude and direction are calculated, a high-precision Sobel operator is applied, and the operator kernel is finely adjusted according to the sampling characteristics of the pixels of the central lens, so that the operator kernel is more suitable for extracting the details of the high-resolution image. For example, when shooting a circuit board microscopic circuit image, feature points generated by fine structures such as circuit intersections, welding spot edges and the like can accurately encode the fine gray scale change direction and intensity difference among pixels through the high-resolution sensitive HOG descriptor, so that different circuit detail features can be distinguished, even if similar circuit layouts are realized, the Euclidean distance difference of the feature descriptors can be more than 10, and the subsequent matching recognition degree is effectively improved.
The fisheye image feature descriptors highlight the field of view characteristics by constructing descriptors using ring partition statistics in combination with radial gradient features. Dividing a circular area with a characteristic point as a circle center and a radius R (which is dynamically determined according to the field of view of the fisheye image and the characteristic point position, wherein the value of R is generally 10-50 pixels) into 3 concentric annular bands, respectively calculating a gray gradient integral value along the radial direction of each annular band, simultaneously counting statistics such as a gray mean value, a variance and the like of pixels in the annular bands, combining to form a 3 multiplied by 4=12-dimensional basic characteristic vector, and finally generating a 20-dimensional descriptor by combining the distance from the characteristic point to the center of the image and the relative angle information of the annular bands. In panoramic natural wind and light shooting, as large view field characteristics such as lake bank curves, mountain contours and the like captured by a fisheye lens, the descriptor can effectively summarize characteristic distribution trend in the view field range and the relative position relation with the center of an image, even under different illumination conditions or partial shielding of a scene, the matching success rate of similar contour characteristics can still be maintained to be more than 80%, and the matching robustness of fisheye image characteristics is enhanced.
S303, comparing the feature descriptors of the fisheye image with the feature descriptors of the central lens image, and finding out all feature point pairs meeting the similarity threshold.
In this embodiment, the feature descriptor similarity calculation is performed using a normalized cross-correlation (NCC) method. For the center shot and fisheye shot image feature descriptor pair, the NCC value is calculated, and the initial similarity threshold is set to be 0.7. The threshold value is obtained by carrying out matching experiment statistical analysis on a large number of combined images of different scenes (including indoor, outdoor, static, dynamic and the like), and error matching is reduced as much as possible on the premise of ensuring a certain matching quantity. For example, in the shooting of indoor home scenes, when furniture furnishings and decoration details are matched, when the NCC value is larger than 0.7, potential matching characteristic point pairs can be preliminarily determined, through the preliminary screening, a potentially matching point pair set can be quickly filtered out from massive characteristic point combinations, the quantity of the potentially matching point pair sets accounts for about 20% -30% of the total combination number, the subsequent accurate matching calculated quantity is greatly reduced, and meanwhile, most of true matching point pairs are reserved. In the NCC calculation process, in order to improve the calculation efficiency, the integral image technology is utilized to rapidly acquire the pixel sum and the square sum of the local area, repeated calculation is reduced, and for 1000×1000 pixel images, the characteristic point matching calculation time is shortened by about 40% compared with the traditional pixel-by-pixel calculation mode, so that the imaging system requirement with higher real-time requirement is met.
S304, eliminating the feature point pairs which are mismatched.
In one embodiment, step S304 includes steps S3041-S3048.
S3041, randomly selecting a part of characteristic points from all characteristic point pairs to be used as a characteristic point pair subset.
In the present embodiment, a proportional random sampling rule is set in consideration of the calculation efficiency and model representativeness. For the whole set containing N characteristic point pairs, the extraction proportion is between 70% and 85% according to the scene complexity and the data size. For example, when shooting an indoor conference scene with relatively simple characteristic distribution and relatively uniform characteristic distribution, N is 500 characteristic point pairs, 75% of the characteristic point pairs are extracted to form a subset of 375 characteristic point pairs, and for a complex and changeable outdoor sports event scene, the number of the characteristic point pairs can be up to 2000, 80% of the characteristic point pairs are extracted to form a subset, so that the subsequent model estimation can cover sufficiently rich image characteristic information, and meanwhile, inefficiency caused by excessive calculation is avoided.
In the sampling process, a hierarchical sampling idea is introduced, and layering is carried out according to the positions (divided into a central area, a middle annular band and an edge area) of the characteristic points in the image and the density (sparse area and dense area) of the characteristic points. If in unmanned aerial vehicle city panorama of taking photo by plane, to the dense layer of city central area feature point of building many some pairs of extraction, sparse layer such as the park that is relatively open is taken out less than appropriate, but guarantee that every layer has enough samples to participate in the model initial construction for the transform model of constructing just takes into account each partial characteristic of image from the beginning, promotes model universality.
S3042, constructing a transformation model of relative position parameters and distortion parameters of the center lens and the four fish-eye lenses, and solving model parameters by utilizing data of the feature point pair subset.
In the embodiment, a model structure based on perspective transformation is selected, and complex relative positions and imaging distortion relations of the center lens and the four fisheye lenses are fully considered. The model parameters include rotation angle (three dimensions, rotation angles θx, θy, θz about X, Y, Z axes), translation amount (Tx, ty, tz), scaling factor (Sx, sy, sz), radial distortion coefficient (k 1, k2 for fisheye lens correction compensation) and tangential distortion coefficient (p 1, p 2) for a total of 12 parameters. For example, in a vehicle-mounted multi-lens looking-around system, the relative positions of the lenses can slightly change due to jolting in the running process of the vehicle, and the parameters can dynamically capture real-time geometric changes among the lenses so as to ensure accurate matching of images.
When the model parameters are solved for the pair of sub-data by utilizing the characteristic points, an objective function is constructed based on the least square method principle, the coordinate correspondence between the characteristic point pair subset in a source image (fisheye image) and a target image (center lens image) is substituted, and the optimal solution of the parameters is solved by an iterative optimization algorithm (such as a Levenberg-Marquardt algorithm). Taking an industrial detection scene as an example, imaging and matching multiple characteristic points on the surface of a product, and carrying out repeated iteration (generally 10-15 times), wherein the convergence precision of each parameter reaches the rotation angle error of +/-0.1 degrees, the translation amount error of +/-0.5 mm, the scaling factor error of +/-0.02 and the distortion coefficient error of +/-0.001. A solid foundation is laid for the subsequent accurate re-projection error calculation.
S3043, calculating the re-projection errors of all the matched characteristic point pairs according to the transformation model.
S3044, dividing the matched characteristic points into inner points and outer points according to the re-projection errors, wherein the inner points are points with the re-projection errors smaller than a set threshold value, and the outer points are points with the re-projection errors larger than the set threshold value.
For steps S3043-3044, in the embodiment, the re-projection error calculation process is to project the feature point coordinates in the fisheye image under the central lens image coordinate system through model transformation according to the solved transformation model. And calculating Euclidean distance between the projected coordinates and actual coordinates of the corresponding matching feature points in the central lens image to serve as a reprojection error. In an intelligent storage monitoring scene, for characteristic points such as a goods shelf, a goods stack and the like, calculating a reprojection error one by one, wherein a calculation formula is as follows:
Where (x proj,yproj) is the projection coordinates and (x actual,xactual) is the actual coordinates.
When the re-projection error threshold is set, the requirements of lens resolution, scene scale and imaging precision are comprehensively considered. For the case of high resolution (such as more than 5000 ten thousand pixels), small scene (photographed by a laboratory precision instrument) and requiring high-precision matching, the threshold is set to 1-2 pixels, and for the case of low resolution, large scene (such as urban high-altitude panoramic monitoring), the threshold is relaxed to 3-5 pixels. For example, in the security monitoring of cultural relics in a museum, for accurately capturing the displacement details of the cultural relics, the threshold is set to be 1.5 pixels, reliable matching and mismatching point pairs are effectively distinguished, and the integrity of the cultural relics fused by subsequent images is guaranteed.
And traversing all matched characteristic point pairs, marking the re-projection error smaller than a set threshold value as an inner point, incorporating the inner point into subsequent model optimization, judging the re-projection error larger than the threshold value as an outer point (mismatching), and temporarily storing to be subjected to subsequent analysis (used for counting mismatching reasons and the like). In outdoor building construction monitoring, through re-projection error screening, about 60% -70% of initial matching point pairs are confirmed as inner points, outer points are mainly derived from characteristic extraction deviations caused by construction site dust emission, equipment shielding and illumination shadows, and the image registration reliability can be greatly improved by targeted removal of the outer points.
S3045, re-estimating the transformation model by using the inner points.
S3046, updating the transformation model according to iteration requirements to obtain a plurality of transformation models.
For steps S3045-S3046, in this embodiment, the transformation model parameters are re-estimated using the newly screened interior point set, and the solution is again performed using the least squares method in combination with the optimization algorithm. At this time, the interior points more accurately reflect the real corresponding relation of the images, and the model parameter update converges to a more accurate direction. Taking a robot vision navigation scene as an example, lens imaging is affected by environmental dynamic in the moving process of the robot, parameters can be adjusted in real time by re-estimating the model each time, for example, the perspective deformation of the fish-eye lens is aggravated due to the approach of an obstacle, the radial distortion coefficient and the translation parameter are timely corrected through the inner point feedback, the vision positioning precision is ensured, and the error of the model parameter after iteration is reduced by about 30% -40% compared with that of the previous round.
The upper limit of the iteration times is set, generally 3-5 times, so that excessive iteration is prevented from falling into local optimum or the instantaneity is prevented from being influenced by the waste of computing resources. Meanwhile, a dynamic termination condition is introduced, the parameter change rate of the two adjacent iteration models is monitored, and when the parameter change rate is smaller than a set small value (for example, the rotation angle change rate is smaller than 0.01 degree/time, and the translation amount change rate is smaller than 0.1 mm/time), the iteration is terminated in advance. The method has the advantages that the method can be used for mapping the scene in real time in the unmanned plane, and can be used for rapidly flying and collecting images in the face of complex terrains, the model accuracy can be guaranteed, model updating can be completed in second-level time, and real-time data processing requirements can be met.
S3047, selecting a transformation model with the largest number of inner points and highest matching degree with the layout and imaging characteristics of the center lens and the four fisheye lenses from a plurality of transformation models as a final transformation model, wherein the layout and imaging characteristics matching degree is measured by calculating the difference degree between the model and the geometric optical models of the center lens and the fisheye lenses.
S3048, the feature points which do not accord with the final transformation model are regarded as mismatching, and the feature points are removed.
For steps S3047-S3048, in this embodiment, the degree of matching is measured by calculating the degree of difference between the model and the geometric optical model of the center lens and the fisheye lens. Based on the optical imaging principle, the difference between the perspective transformation model and the ideal lens imaging ray trace model in the aspects of ray propagation path, imaging visual angle, projection relation and the like is compared. For example, indexes such as a mean value of deviation between a predicted light incident angle of a model and an incident angle of an actual lens optical model, a sum of squares of differences between an imaging size scaling and a theoretical value and the like are calculated, the matching degree of the model is quantized comprehensively, and the lower the difference degree, the higher the matching degree is. The index can accurately screen out a transformation model which is most fit with the imaging physical rule of the real lens in the 3D modeling multi-lens acquisition scene, and the geometric consistency of the model in the complex 3D scene reconstruction is ensured.
And thoroughly eliminating the characteristic point pairs which do not accord with the finally selected optimal transformation model. In panoramic video stitching application, through the strict screening, the phenomenon of 'ghosting' and 'tearing' caused by mismatching at the stitching positions of adjacent frame images is basically eliminated, the pictures are smooth and natural, the stitching precision reaches the sub-pixel level, the panoramic video visual quality is greatly improved, and a stable high-quality image foundation is provided for immersive virtual reality experience, long-time monitoring backtracking of security protection and the like.
For the steps S3041-S3048, through a fine mismatching rejection process, the screened characteristic point pairs can accurately reflect the real geometric corresponding relation between the center lens and the fisheye lens image, so that the registration error of the subsequent image is controlled in a very small range (sub-pixel level), seamless connection is realized in the fields of panoramic stitching and multi-mode image fusion, for example, the target track in a security monitoring 360-degree panoramic picture is continuous and smooth, and image jumping or dislocation caused by inaccurate registration is avoided. In addition, whether light is changed drastically (such as sunrise sunset light shadow transition), scene dynamic interference is large (such as traffic intersection vehicle water-mark dragon) or the state of the lens itself fluctuates (such as unmanned aerial vehicle flight vibration), the scheme can adaptively adjust the transformation model, eliminate mismatching generated by environmental factors and ensure imaging stability. In the intelligent traffic violation evidence obtaining multi-lens system, the feature of the violation vehicle can still be accurately locked under the complex road conditions, the misjudgment rate is lower than 2%, and the law enforcement evidence accuracy is improved.
And S40, registering the corrected fisheye image to the view angle of the central lens image based on the motion characteristic matching result.
In one embodiment, step S40 includes the steps S401-S403.
S401, carrying out relative motion estimation on the corrected fisheye image and the central lens image according to the matched characteristic point pairs, and estimating by adopting a motion estimation model based on an optical flow method in combination with the stability of the installation structure of the central lens and the four fisheye lenses and the dynamic change condition of the overlapping area of the visual fields in the motion estimation process.
In this embodiment, the stability of the mounting structure of the center lens and the four fisheye lenses has an important influence on the relative motion estimation. For example, in a vehicle-mounted multi-lens imaging system, vibration is generated during the running of a vehicle, but the lens is fixed on the vehicle body through a firm anti-shake mounting bracket, and the overall relative position of the lens is kept stable to a certain extent, but small low-frequency shake is possible. In the motion estimation model, constraint information of the installation structure is added, and reasonable displacement and rotation range limiting parameters are set. For example, according to the mechanical performance of the mounting bracket and past experimental data, the translational displacement fluctuation range in the horizontal direction and the vertical direction is set within +/-0.5 mm, the rotation angle change range around each coordinate axis is within +/-0.2 degrees, the optical flow calculation result which does not accord with the actual physical constraint is filtered by taking the optical flow calculation result as priori knowledge, and the accuracy and the reliability of motion estimation are improved.
Because the lenses have angles of view and overlap each other, the motion of objects in the overlapping area or the change of the posture of the lenses can cause the dynamic change of the overlapping area under different shooting scenes. Taking a multi-lens system for monitoring the activities of indoor personnel as an example, when personnel walk in the overlapping area of the center lens and the fish-eye lens, the characteristic distribution in the overlapping area can be changed at different moments. In the optical flow method model, dynamic weights are set for the overlapped areas, higher weights are given to the characteristic point pairs in the overlapped areas (for example, the weights are set to 0.8, and the weights of the characteristic point pairs in the non-overlapped areas are set to 0.2), so that motion estimation is more focused on the characteristic change of the overlapped areas, and the characteristic correspondence of the areas is more critical to subsequent image registration, so that the relative motion state between images can be reflected more accurately.
It is assumed that in a multi-shot shooting scene of a sports event scene, a central shot focuses on the actions of a central player of the event, and a fish-eye shot covers the surrounding audience of the event and the overall scene situation. In two adjacent frames of images, the displacement of the feature point pair in the horizontal direction is calculated to be 5 pixels (the horizontal displacement of the corresponding actual field is about 0.5m in consideration of the conversion relation between the image resolution and the actual field distance), the vertical displacement is 3 pixels (the vertical displacement of the actual field is about 0.3 m), and meanwhile, according to the comprehensive calculation of a plurality of feature point pairs, the tiny rotation angle of the whole around a certain coordinate axis is about 0.1 degrees, the displacement and rotation information form a preliminary result of relative motion estimation, and key data are provided for the subsequent construction of a geometric transformation matrix.
S402, constructing a geometric transformation matrix according to the motion estimation result and combining the optical imaging center coordinate relation and the field angle relation of the center lens and the four fish-eye lenses.
In this embodiment, the center lens and the four fisheye lenses have different optical imaging center coordinates, and the relative positional relationship of these coordinates in the three-dimensional space determines the geometric transformation relationship between the images. Through a lens calibration process which is accurate in advance, coordinate values of an optical imaging center of each lens under a unified coordinate system (for example, the imaging center coordinates of the center lens are (0, 0), the imaging center coordinates of a certain fisheye lens are (10 mm,5mm,3 mm), and the coordinate units are determined according to actual calibration accuracy and are generally accurate to +/-0.1 mm). In constructing the geometric transformation matrix, these coordinate differences are incorporated into the calculation of the translation transformation. For example, when calculating the translation matrix element of the angle transformation from the fisheye image to the central lens image, the distance parameters which need to be translated in the X, Y, Z direction are calculated according to the coordinate difference value, so as to ensure that the images can be aligned correctly in the space position.
The angle of view of different lenses is different in size, and the coverage and angle range differences need to be embodied in geometric transformations. For example, when the angle of view of the center lens is 60 ° (horizontal direction), and the angle of view of a certain fisheye lens is 180 ° (horizontal direction), parameters such as a scaling factor and a rotation angle need to be determined according to the proportional relationship of the angle of view and the correspondence relationship of the imaging region when registering the fisheye image to the angle of view of the center lens image. By means of trigonometric function relation and imaging geometry principle, the scaling of the fisheye image relative to the central lens image in the horizontal and vertical directions (assuming that the scaling of the horizontal direction is 0.5 and the scaling of the vertical direction is 0.6) and the possible rotation angle (such as 10 degrees of clockwise rotation) are calculated, and parameters such as scaling, rotation and the like are integrated into a complete affine transformation matrix together with the previous translation parameters, so that the geometrical transformation matrix conforming to the actual imaging characteristics of the lens is constructed.
In the simplified description of the two-dimensional plane situation (actually, three-dimensional space transformation, and similar principle), assuming that the above calculation results in a translation parameter (tx=10, ty=5) (the unit is a pixel), a scaling factor (sx=0.8, sy=0.7), and a rotation angle (clockwise), the affine transformation matrix (expressed in the form of homogeneous coordinates) is constructed as follows:
This matrix is the geometric transformation matrix required to register the fisheye image to the central lens image perspective for subsequent geometric transformation operations.
S403, performing geometric transformation and image compensation on the corrected fisheye image according to the geometric transformation matrix so as to align the corrected fisheye image with the central lens image in view angle.
In one embodiment, step S403 includes the steps S4031-S4033.
S4031, carrying out coordinate transformation on each pixel point on the corrected fisheye image according to the transformation matrix.
In this embodiment, for each pixel point of the corrected fisheye image, coordinate transformation is performed according to the constructed geometric transformation matrix. For example, a pixel point with coordinates (x, y) in the fisheye image is multiplied by the affine transformation matrix (calculated under homogeneous coordinates) to obtain transformed coordinates (x ', y'), and the calculation formula is as follows (simplified schematic in two dimensions):
s4032, calculating the pixel value of the fisheye image after coordinate transformation by using an interpolation method.
After coordinate transformation, the newly obtained pixel point coordinates are often not integer coordinates, and the pixel values of the image are defined on the integer coordinate positions, so that bilinear interpolation can be used to obtain the accurate pixel values after transformation. For example, the transformed coordinates are calculated as (10.3,20.7), and the pixel values of the positions are calculated (10.3,20.7) according to the weights of bilinear interpolation through the pixel values of four integer coordinate pixels ((10, 20), (10, 21), (11, 20), (11, 21)) around the pixel point, so that the continuity and rationality of the pixel values after the geometric transformation of the image are ensured, and the phenomena of image jaggies, blurring and the like are avoided.
And S4033, mapping the pixel value of the converted fisheye image to the view angle of the central lens image according to the interpolation result.
Through the previous coordinate transformation and interpolation calculation, the fisheye image is correspondingly adjusted on the geometric shape and the pixel value, but the fisheye image is accurately placed under the view angle of the central lens image, so that the fisheye image and the central lens image are completely aligned on the view angle, and the purpose of mapping the pixel value is achieved.
When the method is realized, the transformed and interpolated fisheye image pixel values are filled into corresponding areas according to the corresponding coordinate positions by taking the coordinate system of the central lens image as a reference. For example, the coordinate range of the center lens image is from (0, 0) to (W center,Hcenter), where W center represents the number of pixels in the width direction, H center represents the number of pixels in the height direction, and the pixel coordinates of the fisheye image after transformation are appropriately scaled and offset adjusted, and then the pixel values thereof are filled in the coordinate positions corresponding to the center lens image.
If the partial area of the fisheye image exceeds the coordinate range of the central lens image in the mapping process, the fisheye image can be processed according to the specific application scene. For example, in panoramic stitching and other applications, the excess part can be cut or naturally transited with the center lens image by the boundary processing technology of image fusion, and in some scenes needing to retain all information, the display range of the center lens image can be enlarged or the whole fisheye image transformed content can be displayed in a rolling viewing mode and other modes.
Taking the application of multi-lens imaging of unmanned aerial vehicle aerial city panorama as an example, the central lens focuses on the building area of city landmark, and the fisheye lens shoots the large field of view of surrounding streets, building groups and the like. After the coordinate transformation, interpolation and mapping operation, the image information of the distant streets, building outlines and the like shot by the fisheye lens can be accurately fused under the view angle of the central lens image, such as a curved street originally at the edge of the fisheye image, after transformation and mapping, the image information naturally appears in the edge area of the central lens image in a geometric shape and pixel expression form conforming to the view angle of the central lens, and the image information is seamlessly connected with landmark building images shot by the central lens, so that a complete, continuous and uniform-view-angle urban panoramic image is formed, and a good basis is provided for subsequent image fusion, analysis and other operations.
For steps S4031-S4033, high accuracy alignment of the corrected fisheye image and the central lens image in view angle can be ensured by detailed coordinate transformation, interpolation calculation, and view angle mapping steps. In application scenes such as panoramic image stitching and multi-lens monitoring systems, the stitching part between images is natural in transition, obvious problems such as dislocation, gaps or image distortion cannot occur, so that a finally generated large-view-field high-resolution image is more real and complete, and better image data support can be provided for the fields such as security monitoring, geographical mapping and virtual reality.
S50, fusion processing is carried out on the registered images.
In one embodiment, step S50 includes the steps S501-S508.
S501, obtaining the spectrum response characteristic difference of the center lens and the four fisheye lenses.
The spectral response characteristics of the central lens and the four fisheye lenses can be measured in advance by a high-precision spectrum analyzer, specifically, a high-precision spectrum analyzer such as the USB4000 model of Ocean Optics is used, the wavelength measuring range of the high-precision spectrum analyzer covers visible light and near infrared bands (350-1000 nm), and the spectral resolution can reach 1.5nm. The center lens and the four fisheye lenses are sequentially arranged on a fixed clamp of an optical platform, so that the optical axis of the lens is ensured to be strictly parallel to the incident optical axis of the spectrum analyzer, and the error is controlled within +/-0.1 degrees, so that the measurement accuracy is ensured.
The measuring environment is set as a darkroom, stray light interference is eliminated, the internal temperature is constant at 25 ℃ plus or minus 1 ℃, the humidity is maintained at 40% -60%, and the influence of environmental factors on the performance of optical elements of the lens and the spectrum measuring result is avoided.
For each lens, a standard D65 light source (color temperature 6500K, color rendering index greater than 95) is used as illumination, the light sources are uniformly led in through optical fibers, a Spectralon standard whiteboard with diffuse reflectance close to 100% is vertically illuminated, and the lens collects white board reflected light and focuses the white board reflected light to an optical fiber probe of a spectrum analyzer. When each lens is measured, light intensity values of a plurality of wavelength points (at least 500 data points are uniformly distributed in a measuring wave band) are collected, the measurement is repeated for 5 times, the average value is obtained, and the spectrum response curve of the lens is recorded. For example, the response of the central lens in the green light wave band of 550nm is higher, the light intensity value reaches 800 count units, the response of a certain fisheye lens in the wave band is relatively lower, only 600 count units are needed, and the spectral response characteristic difference of each lens is accurately quantized, so that a key data base is provided for the subsequent color correction.
S502, performing color deviation analysis on the images collected by the fisheye lenses and the central lens under a standard color chart to construct a color correction matrix.
S503, performing color correction on the registered fisheye image and the center lens image according to the color correction matrix.
For steps S502-S503, X-Rite ColorChecker standard color cards are selected and placed in the center of the field of view of the lens, and the color cards are ensured to be full of about 30% of the area of the image, so that enough color information is obtained. And respectively shooting the color card image by using the center lens and the four fish-eye lenses, wherein the image resolution is set to be not lower than 2000 multiplied by 3000 pixels, so that the color card color block boundary is ensured to be clearly distinguished, and each color block sampling point is not less than 100 pixels.
And converting the color space of the shot image, converting the RGB data into CIE Lab uniform color space, wherein the space is more in line with the visual perception characteristic of human eyes, and is convenient for accurately quantifying color deviation. For example, in the center lens image, the measured coordinates of the red color block (CIE Lab theoretical value is l=53.24, a=80.09, b=67.20) of the standard color chart are l=52.80, a=78.50, b= 65.80, and the difference from the theoretical value is calculated, and the similar operation is performed in each fisheye lens and the center lens image for each color block, so as to accumulate the color deviation data.
Based on the collected color deviation data, a 3×3 color correction matrix M is constructed using a least square method. And (3) setting the RGB value of the pixel of the original image as a vector [ x y z ], setting the corrected RGB value as [ x 'y' z '], and solving a linear equation set M x [ x y z ] = [ x' y 'z' ], so that the corrected image color is as close to the theoretical color of the standard color card as possible. For example, an element M (1, 1) =1.02, M (1, 2) = -0.03, M (1, 3) =0.01, etc. of the correction matrix is obtained through calculation, and the matrix is subjected to optimization operations such as singular value decomposition, etc., so that stability and accuracy of the matrix are ensured, and noise amplification or color distortion in a color correction process is reduced. And carrying out color correction on the registered fisheye image and the center lens image pixel by using the matrix, and restoring the true color.
S504, acquiring the illumination reflection difference of the central area and the peripheral area.
S505, calculating the average brightness of different areas based on a brightness adjustment algorithm of area segmentation, and adjusting the brightness according to a preset brightness balance rule.
For steps S504-S505, in the actual shooting scene, the illumination intensity and reflectivity of the central area and the peripheral area are measured by using illumination sensors (e.g. Hagner E4 type all-sky irradiance sensor), respectively. The sensor is arranged at the center of the central lens view field and at the typical positions (such as edges and corners) of the peripheral fisheye lens view field, illumination data in different periods (such as early morning, midday and evening) and different days (sunny day, cloudy day and cloudy day) are collected, the measurement time interval of each position is 1 hour, the data are continuously collected for 24 hours, the statistical mean value is obtained, and the illumination reflection difference characteristic is accurately obtained. For example, in sunny noon, the illumination intensity of the central area reaches 1000lux, the reflectivity is 0.3, the illumination intensity of the peripheral area is only 600lux, the reflectivity is 0.25, and the condition of unbalanced brightness is clear.
And dividing the image into a central region, a middle transition region and a peripheral region by adopting a region dividing method based on image gray histogram threshold segmentation and morphological processing. For example, a bimodal analysis is performed on the image gray histogram, a threshold t1=120 and t2=200 are determined, the gray value is less than T1 and divided into peripheral regions, more than T2 is a central region, and a transition region is located between the two regions. For different areas, calculating average brightness values (calculating the total pixel gray scale sum of the areas by adopting an integral image rapid algorithm, and then averaging), and according to a preset brightness balance rule, for example, keeping the brightness ratio of the central area to the peripheral area within the range of 1.2-1.5, improving the brightness of the darker peripheral area through algorithms such as linear stretching, gamma correction and the like, and properly reducing the gain of the over-bright central area, so that the brightness of the whole image is uniform and natural.
S506, analyzing the lens field model based on the field overlapping theoretical model and the actual imaging calibration result of the center lens and the four fish-eye lenses to calculate an overlapping area.
According to the optical design parameters (focal length, field angle, image plane size and the like) of the central lens and the four fish-eye lenses, a field of view overlapping theoretical model is constructed by utilizing a geometric optical principle. And under a three-dimensional space coordinate system, calculating a view field boundary equation of each lens by using the intersection point of the optical axes of the lenses as an origin and through a trigonometric function relation, and determining a theoretical overlapping region range. For example, knowing the center lens field angle of 60 °, the fisheye lens field angle of 180 °, the focal lengths of 50mm and 8mm, respectively, the overlap boundary angle at a distance from the center in the horizontal direction is calculated, and a mathematical model is built to describe the overlap region geometry.
S507, fusing images in the overlapping area by using a weight distribution strategy, and dynamically adjusting weight distribution according to image quality evaluation indexes and field of view importance of the center lens and the four fish-eye lenses.
The image quality factors such as definition, contrast, noise level and the like are comprehensively considered. The sharpness adopts a variance computing method based on gradients, gradient amplitude variances are computed for each small area (such as 8 multiplied by 8 pixel blocks) of the image, the sharpness is higher as the variances are larger, the contrast is measured by computing the dynamic range (the difference between the maximum value and the minimum value) of the gray level histogram of the image, the noise level is evaluated by utilizing local standard deviation statistics, and the small standard deviation indicates low noise. For example, the sharpness index value of the central region of the central lens image reaches 80 (normalized to the range of 0-100), the contrast ratio is 40, the noise level is 5, and the corresponding indexes of the peripheral region of a certain fisheye lens are respectively 60, 30 and 8, so that the image quality difference of each region is quantized.
And determining the importance of the field of view according to the shooting target and the application scene. In the security monitoring focusing key area scene, the importance of the central lens field of view is set to 0.7, the sum of the importance of the peripheral fisheye lens fields of view is set to 0.3, and in the panoramic display scene, the weight of the fisheye lens field of view is properly improved, such as 0.4 of the central lens and 0.15 of the fisheye lens average weight. And combining the image quality indexes to dynamically allocate the fusion weights. If the definition of a fisheye lens image is suddenly improved in the overlapping area (such as due to good local light), the weight of the fisheye lens image is correspondingly improved, so that the fused image utilizes the optimal imaging part of each lens in real time, and a high-quality fusion result is generated. And calculating the fused RGB values according to the weights of the pixel points in the overlapping area by adopting a weighted average fusion algorithm, so as to realize smooth transition of the image.
S508, smoothing the edge of the fusion area.
And detecting the edge of the fusion area by using a Sobel operator and combining morphological dilation and erosion operation to obtain a transition zone with the edge width of about 3-5 pixels. For example, at the panoramic image splicing position, the boundary pixel position is determined through edge detection, 2 pixels are expanded on two sides of the boundary to form a transition region, and smoothing is performed on the region to avoid abrupt fusion traces.
And (3) setting a 5 multiplied by 5 neighborhood window with each pixel point as a center in a transition region by adopting a smoothing method based on Gaussian weighted average, calculating the neighborhood pixel weight according to Gaussian distribution, carrying out weighted average calculation on the neighborhood pixel according to the weight with larger weight as the neighborhood pixel is closer to the center to replace the original pixel value, and iterating for multiple times (generally for 3-5 times) to ensure that the brightness and the color of a fusion edge are gradually changed naturally, eliminate a splicing gap and color mutation and improve the integral visual quality of an image.
For steps S501-S508, through accurate measurement of the lens spectral response and color card correction, the color deviation of the collected images of different lenses is greatly reduced, the overall brightness of the images is uniform due to illumination reflection difference compensation and brightness adjustment, no obvious bright and dark area contrast exists, the visual comfort is obviously improved, the night scene is monitored, the target loss caused by too bright and too dark surrounding of the center is avoided, or the natural light distribution effect is created during indoor panoramic display, the human eyes capture the image details more easily, and the viewing experience is improved. In addition, the overlapping area is determined, the reasonable weight distribution and the edge smoothness are realized, the perfect splicing and fusion of the images are realized, and gaps, dislocation and color brightness jump at the splicing position are basically eliminated.
S60, post-processing is carried out on the fused image so as to obtain a final image.
In one embodiment, step S60 includes the steps S601-S604.
S601, identifying gaps and discontinuous areas in the fused image by using an edge detection algorithm.
In the embodiment, a Canny edge detection algorithm is selected as a basis, and optimization is performed according to the characteristics of the multi-lens fusion image. Because the fused image may have weak edge traces remained due to pre-processing such as registration and color correction and complex situations of mixing object edges in an actual scene, the high and low thresholds are adaptively adjusted on the basis of the traditional Canny algorithm. Firstly, calculating a local gray gradient histogram of an image, dynamically setting a high threshold value as an 80 th percentile of gradient amplitude according to histogram distribution, and setting a low threshold value as a 20 th percentile, wherein strong edges can be captured, and weak edges can be connected through the low threshold value to form a complete contour. For example, in a panoramic city street view image, edges at the junction of a building and the sky and at the splicing position of different buildings can be accurately positioned through the optimization, the edge positioning precision can reach a sub-pixel level, and the error is controlled to be +/-0.3 pixel.
And combining a multi-scale analysis technology, adopting a Gaussian pyramid to carry out multi-resolution decomposition on the image, detecting edges under different scales, and fusing the results of each scale to enhance the detection capability of edges of gaps and discontinuous areas with different sizes. For small leaf gaps and building decoration line discontinuities, the method can be used for clearly positioning the small flaws under the high-resolution scale, and large-area discontinuous areas such as large-scale advertising boards and building wall splicing gaps can be used for rapidly outlining the low-resolution coarse scale, so that the comprehensiveness and accuracy of detection are comprehensively improved.
S602, generating filling textures according to texture features of the peripheral area of the gap by adopting a texture synthesis algorithm, and filling the gap and the discontinuous area.
In this embodiment, the texture feature is extracted by using Local Binary Pattern (LBP) in combination with gray level co-occurrence matrix (GLCM) with the surrounding area of the slit as the key point. The LBP is used for describing local micro texture structure of the image, and the GLCM quantifies pixel gray scale spatial correlation to obtain characteristic information such as texture direction, roughness, contrast and the like. For example, in a multi-lens fusion landscape image including a lawn and a stone slab road, gaps are located in the stone slab road area, and key features such as a gray periodic variation rule (gray symbiotic frequency in a specific direction in GLCM) and a local texture pattern (LBP characteristic value distribution) are extracted by analyzing surrounding normal stone slab textures, a main texture direction is determined to be about a horizontal direction, and average roughness is 0.4 (normalized value).
And selecting a proper sample block for filling at the periphery of the gap according to the extracted texture features. And (3) setting a threshold value such as a region with the Euclidean distance of the texture features smaller than 0.5 (normalized distance) as a candidate sample block by adopting a searching strategy based on the feature similarity, and preferentially selecting a sample block adjacent to the gap position and having consistent textures, so as to ensure the natural transition of the filling textures. If the stone slab road gap is filled, a plurality of 8×8 pixel sample blocks are selected from the adjacent complete stone slab region, the matching degree between the texture features and the gap is high, and abrupt textures are prevented from being introduced.
And filling the selected sample blocks into the gap area block by block according to the optimal matching principle by using a texture synthesis method based on the image blocks. By adopting PATCHMATCH algorithm ideas, the filling position is initialized randomly, then the better matching block is searched for in an iterative mode, and the filling result is updated continuously. In each iteration, the gradient difference and the texture feature similarity of the boundary between the filling block and the synthesized region are calculated, the filling direction and the filling position are optimized, and the continuous smoothness of the texture after filling is ensured. For example, when filling a grassland gap, as iteration is carried out, the texture of the filled grass blades gradually fuses with the periphery, and after 5-8 iterations, the growth direction and the degree of density of the grass blades at the gap are consistent with the periphery, so that filling marks are difficult to distinguish visually, and the fusion degree of a filling area and an original image is over 90%.
S603, performing color temperature analysis on the white object images acquired by the center lens and the four fish-eye lenses, and constructing a white balance adjustment matrix.
In the initial calibration and periodic calibration stages of the multi-lens system, images of standard white objects (such as Spectralon white boards and Kodak gray card white areas) are specially shot, uniform light receiving of the white objects is ensured, the illumination intensity is controlled to be 500-1000lux, the color temperature is close to the D65 (6500K) light source condition, the resolution of the images is not lower than 2000 x 2000 pixels, and inaccurate color sampling caused by uneven light or low resolution is avoided.
And (3) carrying out gray world hypothesis correction preprocessing on the acquired white object image to enable the average gray value of the image to be close to ideal neutral gray (such as 128), and removing preliminary color shift caused by exposure deviation and other factors, so that the subsequent accurate color temperature analysis is facilitated. For example, the initial average gray level of a white plate image shot by a fish-eye lens is 110, and 127 is achieved after linear stretching correction, so that the basic color accuracy is improved.
And analyzing the relative radiant energy distribution of the white object image in different wave bands (such as 400-700nm visible light wave bands, 10nm sampling intervals) by using a color temperature estimation algorithm based on Planck blackbody radiation law, and estimating the color temperature of the lens by matching with a standard color temperature spectrum curve. The color temperature of the white object shot by the center lens is calculated to be 5800K, and a certain fisheye lens is 6200K, and the color temperature is deviated from the standard D65.
And constructing a 3 multiplied by 3 white balance adjustment matrix according to the color temperature difference of each lens. And calculating gain adjustment coefficients of red, green and blue channels of each lens by adopting VonKries diagonal models and taking standard D65 color temperature as a reference, and filling diagonal elements of the matrix. For example, for a fisheye lens with a higher color temperature, the blue channel gain is properly reduced (e.g., the coefficient is set to 0.9), the red channel gain is finely adjusted and raised (e.g., the coefficient is set to 1.05), and the green channel remains relatively stable (e.g., the coefficient is about 1.0), so as to ensure that the colors of different lens images are uniform in the white balance dimension.
S604, performing color balance adjustment on the filled image according to the white balance adjustment matrix.
And carrying out white balance adjustment matrix multiplication operation on the filled fusion image by taking pixels as units. Let the RGB values of the image pixels be the vector x y z, the adjusted RGB values be x 'y' z ', and the color values are changed pixel by x' y 'z' ] = white balance adjustment matrix x y z. For example, the original RGB value of a certain pixel of the image is [100120110], and after the adjustment of the corresponding lens white balance matrix, the color temperature difference is changed into [103120107], so that the white area presents true neutral white, other colors are balanced and coordinated, and the color error rate (compared with the standard color) is reduced by about 80%.
In the adjustment process, in order to avoid numerical value overflow or precision loss caused by matrix multiplication, dynamic range limitation and normalization processing are performed on the pixel values. And ensuring that the adjusted RGB value is in a legal range of 0-255, if the adjusted RGB value exceeds the legal range, cutting off or linearly scaling, maintaining rich image color gradation, preventing the occurrence of color blocking or color fault phenomenon, and ensuring the high-quality color presentation of the final image.
According to the invention, by carrying out distortion correction processing on the fisheye lens and carrying out feature matching and registration on the center lens, complex scenes such as weak textures, severe illumination change or dynamic and the like can be effectively treated, and the robustness and imaging quality are improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.
The embodiment of the application also provides a multi-lens large-field high-resolution imaging device which is used for executing the steps in any embodiment of the multi-lens large-field high-resolution imaging method. Specifically, referring to fig. 2, fig. 2 shows a schematic block diagram of a multi-lens large-field-of-view high-resolution imaging device 100 according to an embodiment of the present application, where the multi-lens large-field-of-view high-resolution imaging device 100 specifically includes:
An image acquisition unit 110 for synchronously acquiring images acquired by the center lens and each fish-eye lens. The distortion correction unit 120 is configured to perform distortion correction processing on the image collected by each fisheye lens based on the camera calibration parameters, so as to obtain a corrected fisheye image. And a motion feature matching unit 130 for performing motion feature matching on the corrected fisheye image and the central lens image. An image registration unit 140 for registering the corrected fisheye image to the angle of view of the central lens image based on the motion feature matching result. And a fusion processing unit 150, configured to perform fusion processing on the registered images. And a post-processing unit 160, configured to post-process the fused image to obtain a final image.
In one embodiment, the distortion correction unit 120 is specifically configured to obtain camera calibration parameters of each fisheye lens, select a distortion correction algorithm to correct the acquired fisheye image, and generate a corrected fisheye image.
In one embodiment, the motion feature matching unit 130 is specifically configured to detect feature points of the corrected fisheye image and the central lens image by using a feature detector, calculate feature descriptors of pixels around each detected feature point, wherein resolution related information between pixels is enhanced when the central lens image feature descriptors are calculated, enhance field related information when the fisheye image feature descriptors are calculated, compare the feature descriptors of the fisheye image with the feature descriptors of the central lens image, find all feature point pairs satisfying a similarity threshold, and reject mismatching feature point pairs.
In an embodiment, the motion feature matching unit 130 is further specifically configured to randomly select a part of feature points from all feature point pairs as a feature point subset, construct a transformation model of relative position parameters and distortion parameters of the center lens and the four fisheye lenses, solve model parameters by using data of the feature point pair subset, calculate re-projection errors of all matched feature point pairs according to the transformation model, divide the matched feature points into inner points and outer points according to the re-projection errors, wherein the inner points are points with the re-projection errors smaller than a set threshold, the outer points are points with the re-projection errors larger than the set threshold, re-estimate the transformation model by using the inner points, update the transformation model according to iteration requirements to obtain a plurality of transformation models, select a transformation model with the largest number of inner points and highest layout and imaging characteristic matching degree with the center lens and the four fisheye lenses from the plurality of transformation models as a final transformation model, measure the layout and imaging characteristic matching degree by calculating the difference degree of the geometric optical models of the model and the center lens and the fisheye lens, reject the feature points which do not meet the final transformation.
In an embodiment, the image registration unit 140 is specifically configured to perform relative motion estimation on the corrected fisheye image and the central lens image according to the matched feature point pairs, perform estimation by using a motion estimation model based on an optical flow method in combination with stability of a mounting structure of the central lens and the four fisheye lenses and dynamic change conditions of a field of view overlapping area in the motion estimation process, construct a geometric transformation matrix according to a motion estimation result in combination with an optical imaging center coordinate relationship and a field of view angle relationship of the central lens and the four fisheye lenses, and perform geometric transformation and image compensation on the corrected fisheye image according to the geometric transformation matrix so as to align the corrected fisheye image and the central lens image in view angle.
In an embodiment, the image registration unit 140 is further specifically configured to perform coordinate transformation on each pixel point on the corrected fisheye image according to the transformation matrix, calculate a pixel value of the fisheye image after coordinate transformation by using interpolation, and map the pixel value of the fisheye image after transformation to the angle of view of the central lens image according to the interpolation result.
In an embodiment, the fusion processing unit 150 is specifically configured to obtain differences in spectral response characteristics of the center lens and the four fisheye lenses, perform color deviation analysis on images acquired by the fisheye lenses and the center lens under a standard color chart to construct a color correction matrix, perform color correction on the registered fisheye images and the center lens images according to the color correction matrix, obtain differences in illumination reflection of the center region and the peripheral region, calculate average brightness of different regions based on a brightness adjustment algorithm for region segmentation, perform brightness adjustment according to a preset brightness balance rule, analyze a lens field model based on a field-of-view overlapping theoretical model and an actual imaging calibration result of the center lens and the four fisheye lenses to calculate an overlapping region, fuse images in the overlapping region using a weight distribution strategy, dynamically adjust weight distribution according to image quality evaluation indexes and field-of-view importance of the center lens and the four fisheye lenses, and smooth edges of the fusion region.
In an embodiment, the post-processing unit 160 is further specifically configured to identify a gap and a discontinuous region in the fused image by using an edge detection algorithm, generate a filling texture according to texture features of a surrounding region of the gap by using a texture synthesis algorithm, fill the gap and the discontinuous region, construct a white balance adjustment matrix by performing color temperature analysis on white object images acquired by the center lens and the four fisheye lenses, and perform color balance adjustment on the filled image according to the white balance adjustment matrix.
It should be noted that, as those skilled in the art can clearly understand, the specific implementation process of the multi-lens large-field high-resolution imaging device 100 and each unit may refer to the corresponding description in the foregoing method embodiment, and for convenience and brevity of description, the description is omitted here.
The multi-lens large field of view high resolution imaging apparatus described above may be implemented in the form of a computer program which may be run on a computer device as shown in fig. 3.
Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 700 may be a server, where the server may be a stand-alone server or may be a server cluster formed by a plurality of servers.
As shown in fig. 3, the computer device includes a memory, a processor, and a computer program stored on the memory and executable on the processor, which when executed implements the steps of the multi-lens large field of view high resolution imaging method as described above.
The computer device 700 includes a processor 720, a memory, and a network interface 750, which are connected through a system bus 710, wherein the memory may include a non-volatile storage medium 730 and an internal memory 740.
The non-volatile storage medium 730 may store an operating system 731 and computer programs 732. The computer program 732, when executed, causes the processor 720 to perform a multi-lens large field of view high resolution imaging method.
The processor 720 is used to provide computing and control capabilities to support the operation of the overall computer device 700.
The internal memory 740 provides an environment for the execution of a computer program 732 in the non-volatile storage medium 730, which computer program 732, when executed by the processor 720, causes the processor 720 to perform a multi-lens large field of view high resolution imaging method.
The network interface 750 is used for network communications such as sending assigned tasks and the like. It will be appreciated by those skilled in the art that the structure shown in FIG. 3 is merely a block diagram of some of the structures associated with the present inventive arrangements and does not constitute a limitation of the computer device 700 to which the present inventive arrangements may be applied, and that a particular computer device 700 may include more or fewer components than shown, or may combine certain components, or may have a different arrangement of components. Wherein the processor 720 is configured to execute program code stored in the memory to implement a multi-lens large field of view high resolution imaging method.
Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 3 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 3, and will not be described again.
It should be appreciated that in embodiments of the present application, processor 720 may be a central processing unit (Central Processing Unit, CPU), and that Processor 720 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATEARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the present invention, a computer-readable storage medium is provided. The computer readable storage medium may be a non-volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor implements the multi-lens large-field high-resolution imaging method disclosed by the embodiment of the invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another apparatus, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.