US12299968B2 - Cascaded neural network-based attention detection method, computer device, and computer-readable storage medium - Google Patents
Cascaded neural network-based attention detection method, computer device, and computer-readable storage medium Download PDFInfo
- Publication number
- US12299968B2 US12299968B2 US17/631,083 US201917631083A US12299968B2 US 12299968 B2 US12299968 B2 US 12299968B2 US 201917631083 A US201917631083 A US 201917631083A US 12299968 B2 US12299968 B2 US 12299968B2
- Authority
- US
- United States
- Prior art keywords
- neural network
- situation
- inattention
- region
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/809—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/59—Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
- G06V20/597—Recognising the driver's state or behaviour, e.g. attention or drowsiness
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
Definitions
- the present invention relates to the field of image recognition processing, and in particular, to an attention detection method based on a cascade neural network, a computer apparatus and a computer-readable storage medium for implementing the method.
- Human attention detection has always been one of focuses and hotspots in the field of machine learning, and is mainly used in aspects such as security protection and aided driving. Due to a large number of uncertain factors in a real environment, such as the influence of different illumination conditions such as day and night, diversity of human head postures and expressions, and differences in race, gender and age, as well as human wearing of glasses, detecting a human attention state in the real environment is quite challenging.
- remote eye tracking is a classical algorithm for detecting human attention.
- this method needs to rely on a near-infrared lighting device to produce a bright pupil effect, so as to capture eye information.
- the near-infrared lighting device used in this method is easily damaged due to vibration and bumps, and needs long-term maintenance, which is relatively high in cost.
- a method of attention estimation based on a convolutional neural network may be used to automatically learn information of head posture features and eye features from data samples, without manually designing a feature extraction algorithm, and has good robustness.
- a model of the convolutional neural network used is large in size and high in computational complexity, which is not suitable for an embedded device, resulting in great limitations on the use of this method.
- a main objective of the present invention is to provide an attention detection method based on a cascade neural network, which has low computational complexity and good computational performance.
- Another objective of the present invention is to provide a computer apparatus for implementing the foregoing attention detection method based on a cascade neural network.
- Still another objective of the present invention is to provide a computer-readable storage medium for implementing the foregoing attention detection method based on a cascade neural network.
- an attention detection method based on a cascade neural network includes: obtaining video data, recognizing a plurality of image frames, and extracting a face region of the plurality of image frames; recognizing the face region by using a first convolutional neural network to judge whether a first situation of inattention occurs; and recognizing, if it is confirmed that no first situation of inattention occurs, the face region by using a second convolutional neural network to judge whether a second situation of inattention occurs, where computational complexity of the first convolutional neural network is less than computational complexity of the second convolutional neural network.
- the recognizing the face region by using a second convolutional neural network includes: capturing a plurality kind of regions of interest from the face region, and judging whether a second situation of inattention occurs according to a result of recognition of two or more kind regions of interest.
- the plurality kind of regions of interest include a face frame region and a face supplement region; and the judging whether a second situation of inattention occurs according to a result of recognition of two or more kind regions of interest includes: judging whether a second situation of inattention occurs according to a result of image recognition of the face frame region and the face supplement region.
- the plurality kind of regions of interest include a face frame region and an eye region; and the judging whether a second situation of inattention occurs according to a result of recognition of two or more kind regions of interest includes: judging whether a second situation of inattention occurs according to a result of image recognition of the face frame region and the eye region.
- the judging whether a first situation of inattention occurs includes: recognizing the face region by using the first convolutional neural network, judging whether a rotation angle of the head in a preset direction is greater than a preset angle, and if so, confirming that the first situation of inattention occurs.
- the second convolutional neural network includes a first convolution layer, a depthwise convolution layer, a plurality of bottleneck residual layers, a second convolution layer, a linear global depthwise convolution layer, a linear convolution layer, a fully connected layer and a classification layer that are sequentially cascaded.
- the recognizing a plurality of image frames after obtaining the video data includes: selecting one image frame from every preset number of consecutive image frames of the video data for recognition.
- a computer apparatus includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, each step of the foregoing attention detection method based on a cascade neural network is implemented.
- a computer-readable storage medium stores a computer program, where when the computer program is executed by a processor, each step of the foregoing attention detection method based on a cascade neural network is implemented.
- recognition is first performed by using a first convolutional neural network to judge whether a first situation of inattention occurs, and only when it is confirmed that no first situation of inattention occurs, recognition is performed by using a second convolutional neural network to judge whether a second situation of inattention occurs.
- a first convolutional neural network to judge whether a first situation of inattention occurs
- recognition is performed by using a second convolutional neural network to judge whether a second situation of inattention occurs. This may avoid using a convolutional neural network having a relatively complicated degree of computation for all situations, thereby simplifying the overall complexity of the attention detection.
- the face region is divided into a plurality kind of regions of interest, the plurality kind of regions of interest are separately recognized, and then fusion analysis is performed in conjunction with the recognition of the plurality kind of regions of interest to judge whether the second situation of inattention occurs. This may improve accuracy of analysis, and have a better effect of recognizing inattention.
- a face frame region and a face supplement region separately, a plurality of attention situations of people, such as a situation that a driver is looking at a left rearview mirror, looking at the front, looking at a vehicle interior rearview mirror, looking at a right rearview mirror, looking at a dashboard, looking at a central control region, or closing eyes, may be recognized.
- a result of recognition of the face frame region and the face supplement region it can be judged whether the driver is distracted in driving or wants to change a lane in driving.
- Whether the driver is fatigue in driving, is distracted in driving or wants to change a lane in driving can be judged by using the result of recognition of the face frame region and the face supplement region.
- the driver is fatigue in driving.
- the driver is fatigue in driving.
- several consecutive face images fall into a situation of being deviated to the left side or right side and looking at the left side or the right side, it can be considered that the second situation of inattention occurs.
- the design of the first convolutional neural network is very simple, and the computation amount is smaller
- the computation amount of the second convolutional neural network is relatively complex and can be used to accurately recognize whether other situations of inattention occur to the driver, and the judgment is more accurate.
- one image frame is selected from every set number of consecutive image frames to be recognized, which can greatly reduce the computation amount of inattention recognition, and can ensure the accuracy of recognition results.
- one image frame may be selected from every six image frames for recognition.
- FIG. 1 is a flowchart of an embodiment of an attention detection method based on a cascade neural network according to the present invention
- FIG. 2 shows a formula for calculating a softmax probability value of a value in an embodiment of an attention detection method based on a cascade neural network according to the present invention
- FIG. 3 is a structural block diagram of a first convolutional neural network in an embodiment of an attention detection method based on a cascade neural network according to the present invention
- FIG. 4 is a schematic diagram of four regions of interest for image recognition by using an embodiment of an attention detection method based on a cascade neural network according to the present invention
- FIG. 5 is a structural block diagram of a second convolutional neural network in an embodiment of an attention detection method based on a cascade neural network according to the present invention
- FIG. 6 is a structural block diagram when a step size of a bottleneck residual layer of a second convolutional neural network is 1 in an embodiment of an attention detection method based on a cascade neural network according to the present invention.
- FIG. 7 is a structural block diagram when a step size of a bottleneck residual layer of a second convolutional neural network is 2 in an embodiment of an attention detection method based on a cascade neural network according to the present invention.
- an attention detection method based on a cascade neural network is applied to an intelligent device.
- the intelligent device is provided with a photographing apparatus, such as a camera.
- the intelligent device uses video data obtained by the photographing apparatus to perform image analysis, so as to judge whether a situation of inattention occurs to a specific person.
- the intelligent device is provided with a processor and a memory storing a computer program, where the processor implements the attention detection method based on a cascade neural network by executing the computer program.
- This embodiment is mainly based on head postures and eye information, and a cascade convolutional neural network is applied to detect attention of a specific person.
- the entire method mainly includes three steps: video collection, image processing, and attention detection.
- the photographing apparatus In the video collection step, video data is photographed by the photographing apparatus. In this embodiment, recognition can be performed on the basis of the video data of different scenarios (including different photographing angles, external illumination conditions, a position of a target, and the like). Therefore, the photographing apparatus can obtain target video data in various different postures.
- the image processing step a plurality of image frames are obtained from the video data, the frames are detected by using a face detection algorithm, and images of a face region are captured.
- a first convolutional neural network with low computational complexity is first used to judge head postures of a detected object, so as to achieve a preliminary judgment of attention detection; then, the detected face region of interest is further captured and expanded, a second convolutional neural network with high computational complexity is used to extract information of head posture features and eye features, and human behavior is judged by analyzing the human gaze direction.
- the cascade convolutional neural network adopted in this embodiment has good generalization performance and low computational complexity, and is suitable for an embedded device.
- Step S 1 is performed first to obtain video data, that is, the photographing apparatus of the intelligent device obtains consecutive video data.
- the intelligent device may be a device disposed in a vehicle to detect whether the driver is inattentive, and the photographing apparatus may be disposed in front of the driver's seat or in side front of the driver's seat, for example, below a sun visor of the driver's seat or above a center console.
- the photographing apparatus may start recording a video after a vehicle engine is started, and transmit obtained consecutive video data to the processor, and the processor processes the video data.
- step S 2 is performed to recognize an image and extract a face region in the image. Because the video data obtained in step S 1 includes a plurality of consecutive image frames, step S 2 is to recognize the plurality of received image frames. However, because images of the plurality of consecutive image frames are very similar, the recognition of all image frames will lead to a very huge computation amount. In addition, results of recognition of a plurality of adjacent image frames are often the same. Therefore, in this embodiment, one image frame may be selected from every set number of consecutive image frames for recognition. For example, one image frame is selected from every six or eight image frames for recognition, that is, face detection is performed on this image frame, and the detected face region is captured.
- the process of face detection is a process of confirming positions, sizes and postures of all faces in an image under the assumption that one or more faces exist in the input image. This process may be implemented by using the currently known face detection algorithm, and will not be described in detail herein.
- a cascade convolutional neural network is used to recognize images.
- the cascade convolutional neural network includes a first convolutional neural network and a second convolutional neural network, where the first convolutional neural network is used to judge head postures of the detected object, so as to achieve the preliminary judgment of attention, that is, to judge whether the first situation of inattention occurs.
- the second convolutional neural network is used to extract the information of the head posture features and the eye features, and then judge human behavior by analyzing the human gaze direction to implement attention detection.
- step S 3 is performed first to recognize the face region by using the first convolutional neural network.
- an attention concentration state of the detected object is judged by using a detection model which is trained in advance, for example, a situation in which the head of the detected object is rotated by more than a certain angle may be set to be a first situation of inattention.
- a situation in which the driver turns the head leftward (by greater than 60°), turns the head rightward (by greater than 60°), raises the head (by greater than 60°) or lower the head (by greater than 60°) is a situation of inattention.
- the first convolutional neural network is a convolutional neural network with a small size and low computational complexity.
- the first convolutional neural network of this embodiment includes several convolution layers, several pooling layers, a fully connected layer 16 and a classification layer 17 .
- Each pooling layer is located between two adjacent convolution layers.
- a dashed box 11 includes units formed by combining a plurality of convolution layers and pooling layers. Each unit includes one convolution layer and one pooling layer, and an output of the last pooling layer is input to the convolution layer 15 , so that the number of the convolution layers is greater than the number of the pooling layers by one.
- two kinds of parameters of a plurality of convolution layers are provided.
- One kind of parameters of the convolution layers is as follows: m filters are provided, a convolution kernel size is k 1 ⁇ k 1 , and a step size pixel is S 1 ; and the other kind of parameters of the convolution layers is as follows: n filters are provided, a convolution kernel size is k 2 ⁇ k 2 , and a step size pixel is S 2 .
- Each pooling layer samples the output of the previous convolution layer.
- the fully connected layer 16 is configured to implement the process of transforming a two-dimensional feature matrix output by the convolution layer 15 into a one-dimensional feature vector.
- the classification layer 17 uses a softmax function to map outputs of a plurality of neurons into a (0, 1) interval, which may be understood as a probability distribution. Assuming that a probability distribution vector is P, and Pi denotes an i th value in P, the definition of the softmax probability value of this value is shown in the formula of FIG. 2 .
- a maximum value is found in P, and a category corresponding to i with the highest probability is used as a detection result.
- the detection result is whether the rotation angle of the driver's head exceeds the preset angle.
- step S 4 is performed to judge whether the detection result of step S 3 is that the rotation angle of the driver's head exceeds the preset angle. If so, it is confirmed that the first situation of inattention occurs to the driver. In this case, step S 9 is performed to send warning information, such as voice warning information.
- step S 5 is performed first to capture a plurality kind of regions of interest from the face region.
- a first kind region of interest is implemented by directly obtaining embedded face field of vision, and corresponding human image information is a part between a rearview mirror and a mirror on the left of the driver, an image part in a dashed box 21 .
- the human attention can be judged directly by using image information without face detection operations.
- a second kind region of interest is implemented by using the known face detection algorithm to detect and capture a face frame as an input image, an image region within a solid box 22 .
- the second kind region of interest may be referred to as the face frame region.
- a third kind region of interest is implemented by expanding the detected face frame in four directions, up, down, left and right, and adding information of an additional face part based on the second kind region of interest, such as an image region in a solid box 23 in the figure.
- the third kind region of interest may be referred to as the face supplement region.
- additional auxiliary features are added, which can not only confirm the position of the human head, but also has very good robustness.
- a fourth kind region of interest is implemented by capturing only an upper half of the face based on the second kind region of interest, such as an image region in a solid box 24 in FIG. 4 . Therefore, the fourth kind region of interest is the eye region, and is mainly used to judge the driver's attention by paying attention to eye information.
- step S 6 is performed to recognize a plurality kind of regions of interest by using the second convolutional neural network.
- an attention detection model which is trained in advance is used to recognize a plurality kind of regions of interest and classify recognition results.
- the attention of the detected object falls into seven categories, namely, looking at a left rearview mirror, looking at the front, looking at a vehicle interior rearview mirror, looking at a right rearview mirror, looking at a dashboard, looking at a central control region, and closing eyes.
- the attention of the detected object may fall into six categories, namely, looking at the left side, looking at the right side, looking at the front, looking above, looking down, and closing eyes.
- the second convolutional neural network in this embodiment includes a first convolution layer 31 , a depthwise convolution layer 32 , several bottleneck residual layers, a second convolution layer 35 , a linear global depthwise convolution (linear GDConv) layer 36 , a linear convolution (linear Cony) layer 37 , a fully connected layer 38 , and a classification layer 39 that are sequentially cascaded.
- a dashed box in FIG. 5 denotes a unit composed of a plurality of bottleneck residual layers.
- the plurality of bottleneck residual layers include bottleneck residual layers 33 , 34 , and the like.
- the bottleneck residual layer is repeated n i times, the number of channel expansions at each layer is ti, and the step size is s i .
- parameters of the first convolution layer 31 and the second convolution layer 35 may be different.
- Parameters of one convolution layer are as follows: m filters are provided, a convolution kernel size is k 1 ⁇ k 1 , and a step size pixel is S 1 ; and parameters of the other convolution layer are as follows: n filters are provided, a convolution kernel size is k 2 ⁇ k 2 , and a step size pixel is S 2 .
- the depthwise convolution layer 32 performs a convolution operation on each input channel with a convolution kernel of the corresponding channel Assuming that an input dimension is m, and a size is w ⁇ h, m filters corresponding to the convolution layer are provided, the convolution kernel size is k ⁇ k, and a depthwise convolution operation is used. In this case, an output dimension is m, and the size is w′ ⁇ h′.
- Each bottleneck residual layer includes a convolution unit, a depthwise convolution unit, and a residual unit.
- the depthwise convolution unit is configured to receive an output of the convolution unit.
- the step size of the residual unit at the convolution unit is 1, residual computation of the bottleneck residual layer is implemented.
- FIG. 6 when the step size of the residual unit at the convolution unit is 1 and c′ channels are provided, values on corresponding channels of the inputs and outputs are added to implement residual computation, that is, input data passes through a first convolution unit 41 , a depthwise convolution unit 42 , a second convolution unit 43 and a residual unit 44 that are sequentially cascaded, and cumulative calculation of inputs and outputs is implemented at the residual unit 44 .
- an input dimension is [w, h] and an output dimension is [w′, h] Because the input dimension is not equal to the output dimension, residual computation is not performed in this case.
- the structural block diagram in this case is shown in FIG. 7 .
- the input passes through a first convolution unit 51 , a depthwise convolution unit 52 and a second convolution unit 53 in sequence and then is output.
- the convolution kernel size of the linear global depthwise convolution layer 36 is the same as the input size.
- m filters are provided and the convolution kernel size is k ⁇ k.
- n input channels are provided and the size is k ⁇ k.
- m output channels are provided and the size is 1 ⁇ 1.
- the linear convolution layer 37 is a convolution layer in a special form, and uses a linear function as its activation function.
- a calculation process of the fully connected layer 38 is a process of converting a two-dimensional feature matrix output by an upper layer into a one-dimensional feature vector, and the output dimension is the same as the number of classifications.
- the calculation method for the classification layer 39 is the same as the calculation method for the classification layer 17 of the first convolutional neural network, and is not described in detail herein.
- step S 7 is performed to perform fusion analysis on the results of recognition of four kind regions of interest according to step S 6 to obtain results of fusion analysis.
- the face detection algorithm is first used to detect the face region, and a corresponding face frame image is captured to achieve the classification of a face frame; then the captured face frame is expanded in four directions to obtain a new image to achieve the classification of the new image, and whether the driver is distracted in driving or wants to change a lane in driving can be judged by using the results of the two classifications.
- the face detection algorithm is first used to detect the face region, and a corresponding face frame image is captured to achieve the classification of a face frame. Then an upper half of the face frame is retained, that is, information of the eye region is obtained, and the eye information is classified. Fusion analysis is performed by using the results of classification of the two, so that it can be judged whether the driver is fatigue in driving, is distracted in driving or wants to change a lane in driving. When several consecutive face images fall into the situation of closing eyes, it can be judged that the driver is fatigue in driving.
- this method may also be applied to other scenarios, such as the detection of students' attention in class.
- step S 7 is performed to judge whether a second situation of inattention, such as a situation of being fatigue in driving or being distracted in driving occurs according to the analysis result of step S 6 . If so, step S 9 is performed to send warning information; otherwise, step S 10 is performed to predict the driver's behavior according to the analysis result of step S 7 , such as wanting to change a lane to the left in driving. The result of prediction may be provided to other algorithms for use.
- a second situation of inattention such as a situation of being fatigue in driving or being distracted in driving occurs according to the analysis result of step S 6 . If so, step S 9 is performed to send warning information; otherwise, step S 10 is performed to predict the driver's behavior according to the analysis result of step S 7 , such as wanting to change a lane to the left in driving.
- the result of prediction may be provided to other algorithms for use.
- a situation of vehicles coming from behind the left side such as whether there is a driving vehicle within a certain distance behind the left side, can be detected, so as to send indication information to the driver.
- the second convolutional neural network may be replaced with a more lightweight network architecture with strong computing power, such as ShuffleNet, or bottleneck residual layers of the convolutional neural network may be reduced to retrain the model.
- a more lightweight network architecture with strong computing power such as ShuffleNet
- bottleneck residual layers of the convolutional neural network may be reduced to retrain the model.
- the computer apparatus of this embodiment may be an intelligent device, such as a vehicle-mounted monitoring instrument with an image processing capability.
- the computer apparatus includes a processor, a memory, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, each step of the foregoing attention detection method based on a cascade neural network is implemented.
- the computer program may be divided into one or more modules, and the one or more modules are stored in the memory and executed by the processor to complete each module of the present invention.
- the one or more modules may be a series of computer instruction segments capable of performing a particular function, and the instruction segments are used to describe the execution process of the computer program in a terminal device.
- the processor in the present invention may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and the like.
- the general-purpose processor may be a microprocessor or the processor may be any conventional processor, or the like.
- the processor is a control center of the terminal device, and uses various interfaces and lines to connect various parts of the entire terminal device.
- the memory may be configured to store a computer program and/or a module, and the processor implements various functions of the terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory.
- the memory may mainly include a program storage region and a data storage region, where the program storage region may store an operating system, application programs required for at least one function (such as a sound playing function and an image playing function), and the like; and the data storage region may store data (such as audio data and an address book) and the like created according to the use of a mobile phone.
- the memory may include a high-speed random access memory, and may further include a non-volatile memory, such as a hard disk, an internal memory, a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card, at least one magnetic disk memory device, a flash memory device or other volatile solid-state memory devices.
- a non-volatile memory such as a hard disk, an internal memory, a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card, at least one magnetic disk memory device, a flash memory device or other volatile solid-state memory devices.
- the computer program stored in the computer apparatus may be stored in a computer-readable storage medium if implemented in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or some processes in the method of the foregoing embodiment of the present invention are implemented, or may be completed by instructing related hardware through a computer program.
- the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, each step of the foregoing attention detection method based on a cascade neural network can be implemented.
- the computer program includes computer program code, and the computer program code may be in the form of source code, in the form of object code, an executable file or in some intermediate forms, or the like.
- the computer-readable storage medium may include any entity or apparatus capable of carrying computer program code, a recording medium, a USB flash disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM), a random access memory (RAM), electrical carrier signals, telecommunications signals, a software distribution medium, and the like. It should be noted that the content contained in the computer-readable storage medium may be appropriately increased or decreased according to requirements of legislation and patent practice in a jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer-readable storage medium does not include electrical carrier signals or telecommunications signals.
- a cascaded convolutional neural network is used for recognition.
- a first convolutional neural network has relatively low computational complexity and may be used to analyze a simple scenario and judge whether a first situation of inattention occurs to a driver. This can reduce the computation amount of the entire convolutional neural network, and the entire convolutional neural network model is small in size and low in computational complexity.
- the head posture information is first used to make a preliminary judgment on whether the attention is concentrated, and then the head posture and eye information is used to further detect the driver's attention.
- the driver's attention is detected, four methods are adopted to process an original image to obtain four kind regions of interest, and classification results are fused to analyze human behavior and intention. Therefore, the cascaded convolutional neural network of the present invention has good generalization performance and low computational complexity, and is suitable for an embedded device.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Ophthalmology & Optometry (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
Description
Claims (9)
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2019/098407 WO2021016873A1 (en) | 2019-07-30 | 2019-07-30 | Cascaded neural network-based attention detection method, computer device, and computer-readable storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20220277558A1 US20220277558A1 (en) | 2022-09-01 |
| US12299968B2 true US12299968B2 (en) | 2025-05-13 |
Family
ID=69088299
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US17/631,083 Active 2041-06-10 US12299968B2 (en) | 2019-07-30 | 2019-07-30 | Cascaded neural network-based attention detection method, computer device, and computer-readable storage medium |
Country Status (3)
| Country | Link |
|---|---|
| US (1) | US12299968B2 (en) |
| CN (1) | CN110678873A (en) |
| WO (1) | WO2021016873A1 (en) |
Families Citing this family (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US12299968B2 (en) * | 2019-07-30 | 2025-05-13 | Allwinner Technology Co., Ltd. | Cascaded neural network-based attention detection method, computer device, and computer-readable storage medium |
| CN111310705A (en) * | 2020-02-28 | 2020-06-19 | 深圳壹账通智能科技有限公司 | Image recognition method and device, computer equipment and storage medium |
| CN111563468B (en) * | 2020-05-13 | 2023-04-07 | 电子科技大学 | Driver abnormal behavior detection method based on attention of neural network |
| CN111739027B (en) * | 2020-07-24 | 2024-04-26 | 腾讯科技(深圳)有限公司 | Image processing method, device, equipment and readable storage medium |
| US20230290134A1 (en) * | 2020-09-25 | 2023-09-14 | Intel Corporation | Method and system of multiple facial attributes recognition using highly efficient neural networks |
| CN112580458B (en) * | 2020-12-10 | 2023-06-20 | 中国地质大学(武汉) | Facial expression recognition method, device, equipment and storage medium |
| CN113076884B (en) * | 2021-04-08 | 2023-03-24 | 华南理工大学 | Cross-mode eye state identification method from near infrared light to visible light |
| CN113408466A (en) * | 2021-06-30 | 2021-09-17 | 东风越野车有限公司 | Method and device for detecting bad driving behavior of vehicle driver |
| CN113869225A (en) * | 2021-09-29 | 2021-12-31 | 深圳市优必选科技股份有限公司 | A face detection method, device and electronic device |
| CN114112984B (en) * | 2021-10-25 | 2022-09-20 | 上海布眼人工智能科技有限公司 | Fabric fiber component qualitative method based on self-attention |
| CN114067440B (en) * | 2022-01-13 | 2022-04-26 | 深圳佑驾创新科技有限公司 | Pedestrian detection method, device, equipment and medium of cascade neural network model |
| CN114581438B (en) * | 2022-04-15 | 2023-01-17 | 深圳市海清视讯科技有限公司 | MRI image classification method, device, electronic device and storage medium |
| CN117197415B (en) * | 2023-11-08 | 2024-01-30 | 四川泓宝润业工程技术有限公司 | Method, device and storage medium for detecting target in inspection area of natural gas long-distance pipeline |
Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4967559B2 (en) | 2006-09-19 | 2012-07-04 | 株式会社豊田中央研究所 | Doze driving prevention device and program |
| CN108664947A (en) | 2018-05-21 | 2018-10-16 | 五邑大学 | A kind of fatigue driving method for early warning based on Expression Recognition |
| US20190065873A1 (en) * | 2017-08-10 | 2019-02-28 | Beijing Sensetime Technology Development Co., Ltd. | Driving state monitoring methods and apparatuses, driver monitoring systems, and vehicles |
| CN109598174A (en) | 2017-09-29 | 2019-04-09 | 厦门歌乐电子企业有限公司 | The detection method and its device and system of driver status |
| US20190138268A1 (en) * | 2017-11-08 | 2019-05-09 | International Business Machines Corporation | Sensor Fusion Service to Enhance Human Computer Interactions |
| CN109740477A (en) | 2018-12-26 | 2019-05-10 | 联创汽车电子有限公司 | Study in Driver Fatigue State Surveillance System and its fatigue detection method |
| US20190283762A1 (en) * | 2010-06-07 | 2019-09-19 | Affectiva, Inc. | Vehicle manipulation using cognitive state engineering |
| US20210012128A1 (en) * | 2019-03-18 | 2021-01-14 | Beijing Sensetime Technology Development Co., Ltd. | Driver attention monitoring method and apparatus and electronic device |
| US20220129664A1 (en) * | 2020-10-27 | 2022-04-28 | National Cheng Kung University | Deepfake video detection system and method |
| US20220180110A1 (en) * | 2020-12-03 | 2022-06-09 | Shenzhen Horizon Robotics Technology Co., Ltd. | Fatigue State Detection Method and Apparatus, Medium, and Electronic Device |
| US20220277558A1 (en) * | 2019-07-30 | 2022-09-01 | Allwinner Technology Co., Ltd. | Cascaded Neural Network-Based Attention Detection Method, Computer Device, And Computer-Readable Storage Medium |
| US20220327845A1 (en) * | 2021-04-09 | 2022-10-13 | Stmicroelectronics S.R.L. | Method of processing signals indicative of a level of attention of a human individual, corresponding system, vehicle and computer program product |
| US20230154207A1 (en) * | 2020-06-10 | 2023-05-18 | Nanjing University Of Science And Technology | Driver fatigue detection method and system based on combining a pseudo-3d convolutional neural network and an attention mechanism |
-
2019
- 2019-07-30 US US17/631,083 patent/US12299968B2/en active Active
- 2019-07-30 CN CN201980001324.8A patent/CN110678873A/en active Pending
- 2019-07-30 WO PCT/CN2019/098407 patent/WO2021016873A1/en not_active Ceased
Patent Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP4967559B2 (en) | 2006-09-19 | 2012-07-04 | 株式会社豊田中央研究所 | Doze driving prevention device and program |
| US20190283762A1 (en) * | 2010-06-07 | 2019-09-19 | Affectiva, Inc. | Vehicle manipulation using cognitive state engineering |
| US20190065873A1 (en) * | 2017-08-10 | 2019-02-28 | Beijing Sensetime Technology Development Co., Ltd. | Driving state monitoring methods and apparatuses, driver monitoring systems, and vehicles |
| CN109598174A (en) | 2017-09-29 | 2019-04-09 | 厦门歌乐电子企业有限公司 | The detection method and its device and system of driver status |
| US20190138268A1 (en) * | 2017-11-08 | 2019-05-09 | International Business Machines Corporation | Sensor Fusion Service to Enhance Human Computer Interactions |
| CN108664947A (en) | 2018-05-21 | 2018-10-16 | 五邑大学 | A kind of fatigue driving method for early warning based on Expression Recognition |
| CN109740477A (en) | 2018-12-26 | 2019-05-10 | 联创汽车电子有限公司 | Study in Driver Fatigue State Surveillance System and its fatigue detection method |
| US20210012128A1 (en) * | 2019-03-18 | 2021-01-14 | Beijing Sensetime Technology Development Co., Ltd. | Driver attention monitoring method and apparatus and electronic device |
| US20220277558A1 (en) * | 2019-07-30 | 2022-09-01 | Allwinner Technology Co., Ltd. | Cascaded Neural Network-Based Attention Detection Method, Computer Device, And Computer-Readable Storage Medium |
| US20230154207A1 (en) * | 2020-06-10 | 2023-05-18 | Nanjing University Of Science And Technology | Driver fatigue detection method and system based on combining a pseudo-3d convolutional neural network and an attention mechanism |
| US20220129664A1 (en) * | 2020-10-27 | 2022-04-28 | National Cheng Kung University | Deepfake video detection system and method |
| US20220180110A1 (en) * | 2020-12-03 | 2022-06-09 | Shenzhen Horizon Robotics Technology Co., Ltd. | Fatigue State Detection Method and Apparatus, Medium, and Electronic Device |
| US20220327845A1 (en) * | 2021-04-09 | 2022-10-13 | Stmicroelectronics S.R.L. | Method of processing signals indicative of a level of attention of a human individual, corresponding system, vehicle and computer program product |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110678873A (en) | 2020-01-10 |
| US20220277558A1 (en) | 2022-09-01 |
| WO2021016873A1 (en) | 2021-02-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12299968B2 (en) | Cascaded neural network-based attention detection method, computer device, and computer-readable storage medium | |
| Wang et al. | A survey on driver behavior analysis from in-vehicle cameras | |
| Vora et al. | Driver gaze zone estimation using convolutional neural networks: A general framework and ablative analysis | |
| CN109584507B (en) | Driving behavior monitoring method, device, system, vehicle and storage medium | |
| CN105354986B (en) | Driver's driving condition supervision system and method | |
| CN107533754B (en) | Reducing Image Resolution in Deep Convolutional Networks | |
| EP3910507B1 (en) | Method and apparatus for waking up screen | |
| CN110705392A (en) | A face image detection method and device, and storage medium | |
| EP4024270A1 (en) | Gesture recognition method, electronic device, computer-readable storage medium, and chip | |
| CN113283338A (en) | Method, device and equipment for identifying driving behavior of driver and readable storage medium | |
| Ragab et al. | A visual-based driver distraction recognition and detection using random forest | |
| CN111860316B (en) | Driving behavior recognition method, device and storage medium | |
| CN117786520B (en) | Training method and application of target perception model, unmanned vehicle and storage medium | |
| CN114299473A (en) | Driver behavior identification method based on multi-source information fusion | |
| CN113537176A (en) | Method, device and equipment for determining fatigue state of driver | |
| Ai et al. | Double attention convolutional neural network for driver action recognition | |
| Du et al. | A visual recognition method for the automatic detection of distracted driving behavior based on an attention mechanism | |
| CN110837760A (en) | Target detection method, training method and device for target detection | |
| Thornton et al. | Machine learning techniques for vehicle matching with non-overlapping visual features | |
| CN119502686B (en) | Emotion recognition-based head-up display adjustment method, system, equipment and medium | |
| TW202326624A (en) | Embedded deep learning multi-scale object detection model using real-time distant region locating device and method thereof | |
| Yazici et al. | System-on-chip based driver drowsiness detection and warning system | |
| CN119888693A (en) | Neural network model-based distraction driving behavior detection method and related device | |
| Srivastava et al. | Driver’s Face Detection in Poor Illumination for ADAS Applications | |
| CN115496977B (en) | Target detection method and device based on multi-mode sequence data fusion |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| AS | Assignment |
Owner name: ALLWINNER TECHNOLOGY CO., LTD., CHINA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, XIAOHUI;PENG, GANG;NAN, NAN;AND OTHERS;REEL/FRAME:059498/0978 Effective date: 20220317 |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |