HK1200623A1

HK1200623A1 - Method for conducting real-time image recognition on mobile terminal and mobile terminal

Info

Publication number: HK1200623A1
Application number: HK14112556.5A
Authority: HK
Inventors: 刘晓; 丁剑; 刘海龙; 陈波
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2013-09-18
Filing date: 2014-12-15
Publication date: 2015-08-07
Also published as: WO2015039575A1; JP6026680B1; TWI522930B; TW201512996A; JP2016537692A; CN104144345B; CN104144345A; SA114350742B1

Abstract

Method and device for performing image identification are disclosed. The method includes: obtaining a sequence of video frames including at least a first video frame captured prior to a second video frame by the camera； determining a respective motion state of the camera associated with each video frame of the sequence of video frames, including determining a first motion state of the camera associated with the second video frame by performing a motion estimation； determining whether the camera has undergone a transition of motion states from a respective moving state to a respective stationary state between two consecutive video frames； and in accordance with a determination that the camera has undergone the transition of motion states from the respective moving state to the respective stationary state, determining whether a latter video frame of the two consecutive video frames is valid for uploading in accordance with predetermined uploading criteria.

Description

Method for real-time image recognition at mobile terminal and mobile terminal

Technical Field

The present invention relates to image processing and recognition technology, and more particularly, to a method for real-time image recognition in a mobile terminal and the mobile terminal.

Background

The scheme for carrying out real-time image recognition on the mobile terminal comprises the following steps: acquiring a video frame about a target by using a camera of a mobile terminal, and sending the video frame to a cloud server; and the cloud server identifies the received video frames, determines corresponding description information and feeds the description information back to the mobile terminal for display.

For example, the following steps are carried out: the data acquisition can be carried out on various objects such as book covers, CD covers, movie posters, bar codes, two-dimensional codes, commodity Logo and the like; after receiving the video frames, the cloud server feeds back related description information, wherein the description information comprises purchase conditions, comment information and the like of related articles. By adopting the mode, the user can take the photos immediately, and the method is very quick.

The existing mobile terminal mainly has two modes for data acquisition and transmission, which are specifically described below:

the first method is as follows:

and (4) shooting the target by using a mobile terminal camera, and sending the obtained video frame to a cloud server.

This approach has the following drawbacks: the operation is performed manually after the alignment, which is inconvenient. And if the target is not aligned or shakes, the cloud server cannot perform image recognition, and further the mobile terminal cannot successfully acquire the description information about the target.

The second method comprises the following steps:

the method does not need to take pictures, but adopts the real-time data acquisition of the whole picture captured by the camera and sends the acquired image data to the cloud server.

Although the method does not need to shoot manually and is convenient to operate, the method has the following defects: the collected video frames are sent to the cloud server in real time, so that the flow is large; moreover, some collected data frames are not clear, the cloud server cannot identify the data frames, and the identification result cannot be effectively fed back.

Therefore, the existing method for carrying out real-time image recognition on the mobile terminal has the defects of large consumption flow and incapability of effectively feeding back the recognition result.

Disclosure of Invention

The invention provides a method for real-time image recognition at a mobile terminal, which can save flow and effectively feed back a recognition result.

The invention provides a mobile terminal for real-time image recognition, which can save flow and effectively feed back a recognition result.

A method for real-time image recognition at a mobile terminal, the method comprising:

acquiring data in real time by using a camera of the mobile terminal to obtain a video frame;

carrying out motion estimation on the video frame to determine the motion state of the video frame;

judging whether the motion state of the video frame is from motion to still, if so, determining the motion state as a clear frame image, and uploading the clear frame image to a cloud server;

and receiving the recognition result fed back by the cloud server, and displaying the recognition result.

A mobile terminal for real-time image recognition comprises a data acquisition unit, a motion estimation unit, a clear frame judgment unit and a recognition result display unit;

the data acquisition unit acquires data in real time by using a mobile terminal camera to acquire a video frame and sends the video frame to the motion estimation unit;

the motion estimation unit is used for carrying out motion estimation on the video frame, determining the motion state of the video frame and sending the motion state to the clear frame judgment unit;

the clear frame judging unit judges whether the motion state of the video frame is from motion to still, if so, the video frame is determined to be a clear frame image, and the clear frame image is uploaded to a cloud server;

and the identification result display unit receives the identification result fed back by the cloud server and displays the identification result.

According to the scheme, the motion estimation is carried out on the collected video frame, and the motion state of the video frame is determined; and when the motion state of the video frame is judged to be from motion to still, determining the video frame as a clear frame image, and uploading the clear frame image to a cloud server. According to the invention, a mode of actively acquiring data by the camera is adopted, so that a user does not need to take photos manually, and the operation is simple and convenient; in addition, only the clear frame image is sent to the cloud server instead of sending the acquired video frame to the cloud server in real time, so that the flow is saved; the cloud server feeds back the identification result based on the clear frame image, so that the identification result is more effective.

Drawings

FIG. 1 is a schematic flow chart of a method for real-time image recognition at a mobile terminal according to the present invention;

FIG. 2 is a flowchart illustrating an example of a method for real-time image recognition at a mobile terminal according to the present invention;

FIG. 3 is a flowchart of an embodiment of a method for motion estimation according to the present invention;

FIG. 4 is a schematic diagram illustrating an example of block matching according to the present invention;

fig. 5 is a schematic structural diagram of a mobile terminal for real-time image recognition according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the following embodiments and the accompanying drawings.

The inventor finds that in the process of data acquisition, when the user actually applies the method, the user firstly opens the camera and then moves to the alignment target, and the camera acquires data, namely a process from moving to being static. Based on the above, the motion state of the collected video frame is judged, when the motion state of the video frame is obtained to be moving to be static, the video frame is determined to be a clear frame image, and the clear frame image is uploaded to the cloud server; therefore, only the clear frame image is sent to the cloud server, and the flow is saved; and the cloud server feeds back the identification result based on the clear frame image, so that the identification result is more effective.

Referring to fig. 1, a schematic flow chart of the method for real-time image recognition in a mobile terminal according to the present invention includes the following steps:

and 101, acquiring data in real time by using a camera of the mobile terminal to obtain a video frame.

And 102, performing motion estimation on the video frame to determine the motion state of the video frame.

And the mobile camera acquires the pictures frame by frame, and performs motion estimation on a certain video frame acquired in real time to determine the motion state of the certain video frame.

Motion Estimation is known by the english name (Motion Estimation), and is often used in video coding techniques. The invention applies the motion estimation to the video frame collected by the camera of the mobile terminal for processing so as to determine the motion state of the video frame. In particular, motion vectors may be employed to determine video frame motion states, including: calculating a motion vector between the video frame and the previous video frame, wherein the motion vector comprises a motion amplitude and a motion direction; and determining the motion state of the video frame according to the motion vector.

The motion vector between the video frame and the previous video frame is calculated by using motion estimation, and the following method can be specifically adopted:

acquiring a central area pixel of a previous video frame;

taking the central area of the video frame as a starting point, searching out an area similar to the pixels of the central area of the previous video frame around the central area of the video frame, and determining a matching block;

the position vector between the central region of the video frame and the matching block is taken as the motion vector.

The motion state includes motion, still, motion to still, and still to motion, and there are various ways of determining the motion state of the video frame by the motion vector, which can be set according to actual needs, and the following examples are described. Determining the motion state of the video frame from the motion vector comprises:

reading the stored background motion state; if the background motion state is static, the motion amplitudes of continuous N frames from the current frame are all larger than a first motion threshold value, N is a natural number, the current frame is the 1 st frame, the motion states of the 1 st to the N +1 st frames are static, the background motion state is still static, the motion state of the N +1 st frame is determined to be static to motion, and the background motion state is modified into motion; if the background motion state is still and the motion amplitude of the current frame is smaller than the first motion threshold, the motion state of the current frame is still and the motion state of the background is still;

if the background motion state is motion, and the motion amplitudes of the continuous N frames from the current frame are all smaller than the second motion threshold, N is a natural number, the current frame is the 1 st frame, the motion states of the 1 st to the N +1 st frames are motion, the background motion state is still motion, and the N +1 st frame is processed

Determining the frame motion state as moving to be static, and modifying the background motion state into be static; if the background motion state is motion and the motion amplitude of the current frame is greater than the second motion threshold, the motion state of the current frame is still motion and the motion state of the background is still motion.

Further, after determining that the background motion state is still and the motion amplitude of the current frame is smaller than the first motion threshold, the method further includes:

and judging whether the motion amplitude is larger than a third motion threshold value, if so, determining that the motion of the current frame is micro-motion and the background motion state is still static, and if the motion of the continuous M frames from the current frame is micro-motion in the same direction and the current frame is the 1 st frame, determining the motion state of the M frame as static to motion, modifying the background motion state into motion, wherein M is a natural number.

Under the condition that the background motion state is static, if the motion amplitude knows that the motion amplitudes of two continuous frames after the last video frame are both larger than S1 and the motion directions know that the directions of the two continuous frames are opposite, judging that the video frame is in a shaking state, and determining the motion states of the two continuous frames to be still static;

if the motion amplitudes of two consecutive frames after the last video frame are known to be larger than S1 and the directions of the two consecutive frames are known to be the same from the motion directions, the latest frame of the two consecutive frames is determined to be in a static-to-motion state.

And 103, judging whether the motion state of the video frame is from motion to still, if so, determining the motion state as a clear frame image, and uploading the clear frame image to a cloud server.

And if the motion state of the video frame is judged not to be from motion to still, the data frame is not uploaded to the cloud server.

Further, in order to improve the accuracy of judging a clear frame, after determining that the motion state of the video frame is from motion to still, angular point detection can be performed:

calculating the number of corner feature of the video frame;

judging whether the angular point feature number is larger than an angular point number threshold value, if so, determining the angular point feature number as a clear frame image; otherwise, determining as a blurred frame image.

Generally, when the motion state of the video frame is judged to be moving to static, the video frame is determined to be a clear frame image, and the clear frame image is uploaded to a cloud server. Under some application environments, the time for uploading clear frame images can be determined based on the fact that the motion states of multiple continuous video frames are still; specifically, assuming that the current frame is the 1 st frame, if the 1 st frame to the (N + 1) th frame are all in a static state, determining that the (N + 1) th frame is a clear frame, and uploading a clear frame image to a cloud server; n is a natural number. And 104, receiving the recognition result fed back by the cloud server, and displaying the recognition result.

After receiving the video frame, the cloud server feeds back related description information including purchase conditions, comment information and the like of related articles.

In the invention, motion estimation is carried out on the collected video frame to determine the motion state of the video frame; and when the motion state of the video frame is judged to be from motion to still, determining the video frame as a clear frame image, and uploading the clear frame image to a cloud server. According to the invention, a mode of actively acquiring data by the camera is adopted, so that a user does not need to take photos manually, and the operation is simple and convenient; in addition, only the clear frame image is sent to the cloud server instead of sending the acquired video frame to the cloud server in real time, so that the flow is saved; the cloud server feeds back the identification result based on the clear frame image, so that the identification result is more effective.

The method for real-time image recognition at a mobile terminal according to the present invention is illustrated in fig. 2, and comprises the following steps:

step 201, a camera of the mobile terminal is used for acquiring data in real time to obtain a video frame.

Step 202, performing motion estimation on the video frame to determine the motion state of the video frame.

For convenience of explanation, the video frame subjected to motion estimation is referred to as a video frame to be processed.

In the invention, the motion estimation idea used for video coding is transplanted to the mobile terminal camera to process the images, and the video and the image sequence of the mobile terminal camera have common continuous image correlation, so the motion estimation algorithm can be universal. However, the two methods have different points, for example, the resolution of an image acquired by a camera of the mobile terminal is often low, and the mobile terminal does not move with too large movement amplitude when the mobile terminal is actually used by a user; more importantly, a motion estimation algorithm for the global situation is adopted in video coding, the calculation mode is very slow, and the real-time effect cannot be achieved even on a PC. Therefore, aiming at the difference between the motion estimation algorithm and the motion estimation algorithm, the motion estimation algorithm applied to video coding is improved, so that the motion estimation algorithm can achieve very high-efficiency performance on various mobile terminals, and simultaneously consumes less CPU resources, even the consumed CPU resources can be basically ignored. Referring to fig. 3, an example of a flow chart of a method for performing motion estimation according to the present invention includes the following steps:

step 301, obtaining and storing the central area pixel of the video frame to be processed.

Step 302, a central area pixel of a previous video frame of the video frame to be processed is obtained.

After the mobile terminal collects a video frame each time, storing the central area pixel of the video frame; specifically, the pixel gradation values of the central area are stored. In this step, the pixel gray value of the central area of the stored video frame immediately adjacent to the video frame to be processed is extracted.

Step 303, taking the central area of the video frame to be processed as a starting point, searching out an area similar to the central area of the previous video frame around the central area, and determining a matching block.

The method of determining the matching block is described in detail below with reference to fig. 4. In the figure, a square area marked with grids in a previous video frame is a central area of the previous video frame, a dotted area in a video frame to be processed is a central area of the video frame to be processed, a limited neighborhood around the dotted frame is searched from inside to outside to find an area similar to the pixel gray value of the central area of the previous video frame, the area is called a matching block, and the square area marked with grids in the video frame to be processed is the searched matching block.

In this example, the gray scale of the pixel in the central area (x, y) of the previous video frame is represented as I (x, y), the search block in the video frame to be processed for matching with the central area of the previous video frame is represented as I' (x, y), the square sum of the difference between the two is used as the index of the block similarity, and assuming that the block size is N times N pixels, the square sum of the errors S is:

calculating the block with the minimum S according to the formula to be used as a matching block; the motion vector between two frames is determined according to the position of the matching block to the central region of the previous video frame, and the arrow in fig. 4 indicates the motion direction. The searching process adopts an approximation algorithm, specifically, large step length movement is firstly carried out, and an area with relatively small similarity is found; and then reducing the step size in the region, and gradually approaching to obtain a final search result. To ensure the speed of the algorithm, if the pixels of the video frame are too large and exceed a certain threshold, a down-sampling process may be performed, for example, down-sampling a 2000 by 2000 data frame to 400 by 400. In fig. 4, the matching block is represented by a rectangular region; in practical application, the matching can be performed by using the shape blocks such as diamond matching, circular matching and the like.

In the motion estimation, other similarity determination methods such as a mean square error, a sum of absolute errors, a sum of average errors, and the like may be used in addition to the above-described similarity determination method using the sum of squared errors. Besides, in practical application, other search algorithms such as a three-step method and a diamond search method can be adopted.

Step 304, calculating a position vector between the central area of the video frame to be processed and the matching block as a motion vector.

The calculated motion vector contains the direction of motion and the magnitude of motion.

In step 305, the motion state of the video frame is determined by the motion vector.

In the invention, the motion states of the video frame mainly include the following 4 states: moving, stationary, moving to stationary, stationary to moving; wherein moving to stationary is determined as a timing to upload an image.

In practical applications, different amplitude thresholds need to be adopted for the moving to still and static to moving states, and in image recognition applications, the amplitude threshold for moving to still is usually higher, and the amplitude threshold is represented by a second motion threshold; the amplitude threshold for still to moving is lower, where the amplitude threshold is represented by the first motion threshold. The first motion threshold is less than the second motion threshold.

The mobile terminal stores a background motion state, which can be extracted from the stored state. And then, combining the background motion state, the first motion threshold and the second motion threshold, determining the motion state of the video frame to be processed. Specifically, the method comprises the following steps:

reading the stored background motion state;

if the background motion state is static, the motion amplitudes of continuous N frames from the current frame are all larger than a first motion threshold value, N is a natural number, the current frame is the 1 st frame, the motion states of the 1 st to the N +1 st frames are static, the background motion state is still static, the motion state of the N +1 st frame is determined to be static to motion, and the background motion state is modified into motion; if the background motion state is still and the motion amplitude of the current frame is smaller than the first motion threshold, the motion state of the current frame is still and the motion state of the background is still;

if the background motion state is motion, the motion amplitudes of continuous N frames from the current frame are all smaller than a second motion threshold value, N is a natural number, the current frame is the 1 st frame, the motion states of the 1 st to the N +1 st frames are motion, the background motion state is still motion, the motion state of the N +1 st frame is determined to be motion to static, and the background motion state is modified to be static; if the background motion state is motion and the motion amplitude of the current frame is greater than the second motion threshold, the motion state of the current frame is still motion and the motion state of the background is still motion.

After determining that the background motion state is still and the motion amplitude of the current frame is smaller than the first motion threshold, the method further includes:

In this example, a "state holding" strategy is adopted, the state switching is not performed for an occasional single static or moving state, and the state switching is performed only when the state changes are accumulated more than twice, so that the stability of the state is achieved by the strategy. The first motion threshold is represented by S1, the second motion threshold is represented by S2, the third motion threshold is represented by S3, and the motion amplitude of the video frame to be processed is represented by S, assuming that two state changes are generally accumulated to perform state switching, and five state changes are accumulated to perform state switching for micro motion; the corresponding "state retention" policy specifically includes:

one) case where the background motion state is stationary:

1) when S > S1, determining that the video frame to be processed (represented by the Y frame) is in a static state and the background motion state is still static, then judging whether the motion amplitude of the Y +1 frame is still larger than S1, if so, determining that the Y +1 frame is in a static-to-motion state and modifying the background motion state into motion;

2) when S < S1, determining that the video frame to be processed is in a static state and the background motion state is still static;

3) when S3< S1, it is determined that a video frame to be processed (represented by a Z-th frame) is a micro-motion, and Z to Z + 3-th frames are judged as a micro-motion in the same direction, but the Z to Z + 3-th frames are still determined as a still state, and if the Z + 4-th frame is also a micro-motion in the same direction, the Z + 4-th frame is determined as a still-to-motion state, and a background motion state is modified to a motion. The number of accumulations can be set as desired.

Two) the background motion state is a motion:

1) when S < S2, determining that the video frame to be processed (represented by the Y frame) is in a motion state and the background motion state is still in motion, then judging whether the motion amplitude of the Y +1 frame is still less than S2, if so, determining that the Y +1 frame is in a motion-to-static state and modifying the background motion state into static;

2) when S > S2, the video frame to be processed is determined to be in a motion state, and the background motion state is still motion.

Further, hand trembling conditions can also be determined: if "left and right neglect" occurs, that is, if the motion vector direction is opposite, it is determined as a "hand trembling" situation, in which if the background is in a stationary state, the motion state is not modified for the moment until motion in the same direction continues to be generated.

Step 306, judging whether to continue the motion estimation, if so, returning to execute step 301, otherwise, ending the flow.

If the video frame is continuously acquired in step 201, this step will continue to perform motion estimation on the acquired video frame.

Step 203, judging whether the motion state of the video frame is from motion to still, if so, executing step 204; otherwise, the flow ends.

When the camera is just turned on, the state can be defaulted to be static; the user then moves the camera to the target, which process will go through still to motion, motion to still.

And if the motion state of the video frame is judged to be moving to be static, the corresponding video frame is taken as the video frame to be detected.

And step 204, calculating the feature number of the corner points of the video frame to be detected.

The corner detection algorithm includes various algorithms, such as FAST corner detection algorithm, Harris corner detection algorithm, CHOG corner detection algorithm, FREAK corner detection algorithm, and the like, and any one of the algorithms can be selected; these algorithms all have good corner detection capability. According to the definition of the effective picture, the first requirement is clear, and the second requirement has richer textures; based on these two points, a FAST corner detection algorithm may be employed. When the picture is unclear, the number of FAST corners is often small, for example, in a blank picture or a picture with a single color, the number of FAST corners is small, so that it is only necessary to judge the number of FAST corners of the picture to determine whether the picture is an effective picture.

Besides, the image validity is judged by adopting an angular point detection algorithm, and in practical application, an algorithm for judging the image validity based on gradient features, edge features and the like can be adopted.

Step 205, judging whether the angular point feature number is greater than an angular point number threshold, if so, determining the angular point feature number as a clear frame image, and uploading the clear frame image to a cloud server; otherwise, determining as a blurred frame image.

And step 206, receiving the recognition result fed back by the cloud server, and displaying the recognition result.

Referring to fig. 5, which is a schematic structural diagram of a mobile terminal for real-time image recognition according to the present invention, the mobile terminal includes a data acquisition unit, a motion estimation unit, a clear frame determination unit, and a recognition result display unit;

Preferably, the motion estimation unit comprises a motion vector calculation subunit and a state determination subunit;

the motion vector calculation subunit calculates a motion vector between a video frame and a previous video frame, and sends the motion vector to the state determination subunit; the motion vector comprises a motion amplitude and a motion direction;

and the state determining subunit determines the motion state of the video frame according to the motion vector.

Preferably, the state determining subunit includes a state determining module, which reads the stored background motion state; if the background motion state is static, the motion amplitudes of continuous N frames from the current frame are all larger than a first motion threshold value, N is a natural number, the current frame is the 1 st frame, the motion states of the 1 st to the N +1 st frames are static, the background motion state is still static, the motion state of the N +1 st frame is determined to be static to motion, and the background motion state is modified into motion; if the background motion state is still and the motion amplitude of the current frame is smaller than the first motion threshold, the motion state of the current frame is still and the motion state of the background is still;

Preferably, the state determining module determines that the background motion state is stationary and the motion amplitude of the current frame is smaller than the first motion threshold, and then determines whether the motion amplitude is larger than a third motion threshold, if so, the current frame motion is micro-motion and the background motion state is still stationary, and if the motion of M consecutive frames from the current frame is micro-motion in the same direction and the current frame is the 1 st frame, determines the motion state of the M-th frame as stationary to motion and modifies the background motion state into motion, where M is a natural number.

Preferably, the motion vector calculation unit includes a motion vector determination module, which obtains a pixel of a central area of a previous video frame; taking the central area of the video frame as a starting point, searching out an area similar to the pixels of the central area of the previous video frame around the central area of the video frame, and determining a matching block; the position vector between the central region of the video frame and the matching block is taken as the motion vector.

Preferably, the clear frame judging unit comprises a motion-to-still determining module and a corner detecting module;

the motion-to-still determining module is used for judging whether the motion state of the video frame is motion-to-still or not, and if so, sending a starting instruction to the corner point detecting module;

the corner detection module receives a starting instruction from the motion-to-still determination module and calculates the number of corner features of the video frame; judging whether the angular point feature number is greater than an angular point number threshold value, if so, determining the angular point feature number as a clear frame image, and uploading the clear frame image to a cloud server; otherwise, determining as a blurred frame image.

The mobile terminal described in the embodiments of the present invention may be implemented in hardware, software, firmware, or any combination of the three. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media that facilitate transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, such computer-readable media can include, but is not limited to, RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store program code in the form of instructions or data structures and other forms readable by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. In addition, any link is properly defined as a computer readable medium, including compact discs (disk) and diskettes (disc), including compact discs, laser discs, optical discs, DVDs, floppy discs, and Blu-ray discs, where disks usually reproduce data magnetically, and disks usually reproduce data optically with lasers. Combinations of the above may also be included in the computer-readable medium.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for real-time image recognition at a mobile terminal, the method comprising:

2. The method of claim 1, wherein said performing motion estimation on a video frame to determine a motion state of the video frame comprises:

calculating a motion vector between the video frame and the previous video frame, wherein the motion vector comprises a motion amplitude and a motion direction; and determining the motion state of the video frame according to the motion vector.

3. The method of claim 2, wherein said determining a motion state of the video frame from the motion vectors comprises:

reading the stored background motion state;

if the background motion state is static, the motion amplitudes of continuous N frames from the current frame are all larger than a first motion threshold value, N is a natural number, the current frame is the 1 st frame, the motion states of the 1 st to the (N + 1) th frames are static, the background motion state is still static, the motion state of the (N + 1) th frame is determined to be static to motion, and the background motion state is modified into motion;

if the background motion state is motion, the motion amplitudes of the continuous N frames from the current frame are all smaller than a second motion threshold value, N is a natural number, the current frame is the 1 st frame, the motion states of the 1 st to the N +1 st frames are motion, the background motion state is still motion, the motion state of the N +1 st frame is determined to be motion to still, and the background motion state is modified to be still.

4. The method of claim 3, wherein after determining that the background motion state is stationary and the current frame motion amplitude is less than the first motion threshold, the method further comprises:

5. The method of claim 3, wherein after determining that the background motion state is stationary, the method comprises:

and if the motion amplitude of two continuous frames after the last video frame is known to be larger than the first motion threshold value and the motion direction of the two continuous frames is known to be opposite, determining that the two continuous frames are in a shaking state, determining the motion states of the two continuous frames to be still and determining the background motion state to be still.

6. The method of claim 2, wherein said calculating the motion vector between the video frame and its previous video frame comprises:

acquiring a central area pixel of a previous video frame;

7. The method of any of claims 1 to 6, wherein after determining that the video frame motion state is moving to stationary, the method further comprises:

calculating the number of corner feature of the video frame;

8. A mobile terminal for real-time image recognition is characterized by comprising a data acquisition unit, a motion estimation unit, a clear frame judgment unit and a recognition result display unit;

9. The mobile terminal of claim 8, wherein the motion estimation unit includes a motion vector calculation subunit and a state determination subunit;

10. The mobile terminal of claim 9, wherein the state determination subunit includes a state determination module to read a stored background motion state; if the background motion state is static, the motion amplitudes of continuous N frames from the current frame are all larger than a first motion threshold value, N is a natural number, the current frame is the 1 st frame, the motion states of the 1 st to the N +1 st frames are static, the background motion state is still static, the motion state of the N +1 st frame is determined to be static to motion, and the background motion state is modified into motion;

11. The mobile terminal of claim 10, wherein the state determining module determines whether the background motion state is still and the current frame motion amplitude is smaller than the first motion threshold, and then determines whether the motion amplitude is larger than a third motion threshold, if so, the current frame motion is micro motion and the background motion state is still, and if the motion of M consecutive frames from the current frame is micro motion in the same direction and the current frame is the 1 st frame, determines the motion state of the M-th frame as still to motion and modifies the background motion state into motion, wherein M is a natural number.

12. The mobile terminal of claim 9, wherein the motion vector calculation unit includes a motion vector determination module that obtains a central region pixel of a previous video frame; taking the central area of the video frame as a starting point, searching out an area similar to the pixels of the central area of the previous video frame around the central area of the video frame, and determining a matching block; the position vector between the central region of the video frame and the matching block is taken as the motion vector.

13. The mobile terminal of any of claims 8 to 12, wherein the clear frame determination unit comprises a motion-to-still determination module and a corner detection module;