WO2018132961A1 - Appareil, procédé et produit-programme d'ordinateur pour une détection d'objet - Google Patents
Appareil, procédé et produit-programme d'ordinateur pour une détection d'objet Download PDFInfo
- Publication number
- WO2018132961A1 WO2018132961A1 PCT/CN2017/071477 CN2017071477W WO2018132961A1 WO 2018132961 A1 WO2018132961 A1 WO 2018132961A1 CN 2017071477 W CN2017071477 W CN 2017071477W WO 2018132961 A1 WO2018132961 A1 WO 2018132961A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- neural network
- factor
- sample
- classification
- factors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Definitions
- Embodiments of the disclosure generally relate to information technologies, and, more particularly, to object detection.
- object detection plays an important role in most applications.
- object detection systems are broadly used for computer vision, automatic speech recognition, natural language processing, drug discovery and toxicology, customer relationship management, recommendation systems, and biomedical Informatics.
- the object detection systems can be used in video surveillance, traffic surveillance, driver assistant systems, autonomous vehicle, traffic monitoring, human identification, human-computer interaction, public security, event detection, tracking, frontier guards and the Customs, scenario analysis and classification, image indexing and retrieve, etc.
- the input/sample of object detection systems may be degraded by at least two factors which may greatly influence the performance of the object detection systems.
- an image captured by the driver assistant system may be degraded by at least two of haze, rain, fog, sand, dust, sand storm, hailstone, dark light, etc.
- haze and dark light are two common sources of degrading image quality. They hamper the visibility of the scene and its objects. The intensity, hue and saturation of the scene and its objects are also altered by the haze and dark light. The performance of the driver assistant system degrades drastically when complex and challenging weather occurs.
- the apparatus may comprise at least one processor; and at least one memory including computer program code, the memory and the computer program code configured to, working with the at least one processor, cause the apparatus to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
- the method may comprise receiving a sample degraded by at least two factors; performing the following operations for each factor of the at least two factors: removing a factor of the at least two factors from the sample by a factor removal neural network; computing residual information corresponding to the factor based on the sample and the output of the factor removal neural network; computing a difference between the output and a sum of residual information for all the other factor (s) except the factor; extracting a feature from the difference by a feature extraction neural network; stacking the feature extracted by each feature extraction neural network to input to a classification neural network; and outputting the result of classification neural network as a detection result.
- a computer program product embodied on a distribution medium readable by a computer and comprising program instructions which, when loaded into a computer, causes a processor to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
- a non-transitory computer readable medium having encoded thereon statements and instructions to cause a processor to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
- an apparatus comprising means configured to receive a sample degraded by at least two factors; means configured to perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to a classification neural network; and means configured to output the result of classification neural network as a detection result.
- Figure 1 is a simplified block diagram showing an apparatus according to an embodiment
- Figure 2 is a flow chart depicting a process of a training stage of a neural network according to an embodiment of the present disclosure
- Figure 3 is a flow chart depicting a process of a testing stage of a neural network according to embodiments of the present disclosure
- Figure 4 schematically shows a neural network used for the training stage according to an embodiment of the disclosure.
- Figure 5 schematically shows a neural network used for the testing stage according to an embodiment of the disclosure.
- the term 'circuitry' refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry) ; (b) combinations of circuits and computer program product (s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor (s) or a portion of a microprocessor (s) , that require software or firmware for operation even if the software or firmware is not physically present.
- This definition of 'circuitry'a pplies to all uses of this term herein, including in any claims.
- the term 'circuitry'a lso includes an implementation comprising one or more processors and/or portion (s) thereof and accompanying software and/or firmware.
- the term 'circuitry'a s used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network apparatus, other network apparatus, and/or other computing apparatus.
- non-transitory computer-readable medium which refers to a physical medium (e.g., volatile or non-volatile memory device)
- the embodiments are mainly described in the context of deep convolutional neural network, they are not limited to this but can be applied to any suitable neural network. Moreover, the embodiments of the disclosure can be applied to automatic speech recognition, natural language processing, drug discovery and toxicology, customer relationship management, recommendation systems, and biomedical Informatics, etc., though they are mainly discussed in the context of image recognition.
- FIG. 1 is a simplified block diagram showing an apparatus, such as an electronic apparatus 10, in which various embodiments of the disclosure may be applied. It should be understood, however, that the electronic apparatus as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the disclosure and, therefore, should not be taken to limit the scope of the disclosure. While the electronic apparatus 10 is illustrated and will be hereinafter described for purposes of example, other types of apparatuses may readily employ embodiments of the disclosure.
- the electronic apparatus 10 may be a portable digital assistant (PDAs) , a user equipment, a mobile computer, a desktop computer, a smart television, an intelligent glass, a gaming apparatus, a laptop computer, a media player, a camera, a video recorder, a mobile phone, a global positioning system (GPS) apparatus, a smart phone, a tablet, a server, a thin client, a cloud computer, a virtual server, a set-top box, a computing device, a distributed system, a smart glass, a vehicle navigation system, an advanced driver assistance systems (ADAS) , a self-driving apparatus, a video surveillance apparatus, an intelligent robotics, a virtual reality apparatus and/or any other types of electronic systems.
- PDAs portable digital assistant
- a user equipment a mobile computer
- desktop computer a smart television
- an intelligent glass a gaming apparatus
- a laptop computer a media player
- a camera a video recorder
- a mobile phone a global positioning system (GPS
- the electronic apparatus 10 may run with any kind of operating system including, but not limited to, Windows, Linux, UNIX, Android, iOS and their variants. Moreover, the apparatus of at least one example embodiment need not to be the entire electronic apparatus, but may be a component or group of components of the electronic apparatus in other example embodiments.
- the electronic apparatus may readily employ embodiments of the disclosure regardless of their intent to provide mobility.
- embodiments of the disclosure may be utilized in conjunction with a variety of applications.
- the electronic apparatus 10 may comprise processor 11 and memory 12.
- Processor 11 may be any type of processor, controller, embedded controller, processor core, graphics processing unit (GPU) and/or the like.
- processor 11 utilizes computer program code to cause an apparatus to perform one or more actions.
- Memory 12 may comprise volatile memory, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data and/or other memory, for example, non-volatile memory, which may be embedded and/or may be removable.
- RAM volatile Random Access Memory
- non-volatile memory may comprise an EEPROM, flash memory and/or the like.
- Memory 12 may store any of a number of pieces of information, and data.
- memory 12 includes computer program code such that the memory and the computer program code are configured to, working with the processor, cause the apparatus to perform one or more actions described herein.
- the electronic apparatus 10 may further comprise a communication device 15.
- communication device 15 comprises an antenna, (or multiple antennae) , a wired connector, and/or the like in operable communication with a transmitter and/or a receiver.
- processor 11 provides signals to a transmitter and/or receives signals from a receiver.
- the signals may comprise signaling information in accordance with a communications interface standard, user speech, received data, user generated data, and/or the like.
- Communication device 15 may operate with one or more air interface standards, communication protocols, modulation types, and access types.
- the electronic communication device 15 may operate in accordance with second-generation (2G) wireless communication protocols IS-136 (time division multiple access (TDMA) ) , Global System for Mobile communications (GSM) , and IS-95 (code division multiple access (CDMA) ) , with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS) , CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA) , and/or with fourth-generation (4G) wireless communication protocols, wireless networking protocols, such as 802.11, short-range wireless protocols, such as Bluetooth, and/or the like.
- Communication device 15 may operate in accordance with wireline protocols, such as Ethernet, digital subscriber line (DSL) , and/or the like.
- Processor 11 may comprise means, such as circuitry, for implementing audio, video, communication, navigation, logic functions, and/or the like, as well as for implementing embodiments of the disclosure including, for example, one or more of the functions described herein.
- processor 11 may comprise means, such as a digital signal processor device, a microprocessor device, various analog to digital converters, digital to analog converters, processing circuitry and other support circuits, for performing various functions including, for example, one or more of the functions described herein.
- the apparatus may perform control and signal processing functions of the electronic apparatus 10 among these devices according to their respective capabilities.
- the processor 11 thus may comprise the functionality to encode and interleave message and data prior to modulation and transmission.
- the processor 11 may additionally comprise an internal voice coder, and may comprise an internal data modem. Further, the processor 11 may comprise functionality to operate one or more software programs, which may be stored in memory and which may, among other things, cause the processor 11 to implement at least one embodiment including, for example, one or more of the functions described herein. For example, the processor 11 may operate a connectivity program, such as a conventional internet browser.
- the connectivity program may allow the electronic apparatus 10 to transmit and receive internet content, such as location-based content and/or other web page content, according to a Transmission Control Protocol (TCP) , Internet Protocol (IP) , User Datagram Protocol (UDP) , Internet Message Access Protocol (IMAP) , Post Office Protocol (POP) , Simple Mail Transfer Protocol (SMTP) , Wireless Application Protocol (WAP) , Hypertext Transfer Protocol (HTTP) , and/or the like, for example.
- TCP Transmission Control Protocol
- IP Internet Protocol
- UDP User Datagram Protocol
- IMAP Internet Message Access Protocol
- POP Post Office Protocol
- SMTP Simple Mail Transfer Protocol
- WAP Wireless Application Protocol
- HTTP Hypertext Transfer Protocol
- the electronic apparatus 10 may comprise a user interface for providing output and/or receiving input.
- the electronic apparatus 10 may comprise an output device 14.
- Output device 14 may comprise an audio output device, such as a ringer, an earphone, a speaker, and/or the like.
- Output device 14 may comprise a tactile output device, such as a vibration transducer, an electronically deformable surface, an electronically deformable structure, and/or the like.
- Output Device 14 may comprise a visual output device, such as a display, a light, and/or the like.
- the electronic apparatus may comprise an input device 13.
- Input device 13 may comprise a light sensor, a proximity sensor, a microphone, a touch sensor, a force sensor, a button, a keypad, a motion sensor, a magnetic field sensor, a camera, a removable storage device and/or the like.
- a touch sensor and a display may be characterized as a touch display.
- the touch display may be configured to receive input from a single point of contact, multiple points of contact, and/or the like.
- the touch display and/or the processor may determine input based, at least in part, on position, motion, speed, contact area, and/or the like.
- the electronic apparatus 10 may include any of a variety of touch displays including those that are configured to enable touch recognition by any of resistive, capacitive, infrared, strain gauge, surface wave, optical imaging, dispersive signal technology, acoustic pulse recognition or other techniques, and to then provide signals indicative of the location and other parameters associated with the touch. Additionally, the touch display may be configured to receive an indication of an input in the form of a touch event which may be defined as an actual physical contact between a selection object (e.g., a finger, stylus, pen, pencil, or other pointing device) and the touch display.
- a selection object e.g., a finger, stylus, pen, pencil, or other pointing device
- a touch event may be defined as bringing the selection object in proximity to the touch display, hovering over a displayed object or approaching an object within a predefined distance, even though physical contact is not made with the touch display.
- a touch input may comprise any input that is detected by a touch display including touch events that involve actual physical contact and touch events that do not involve physical contact but that are otherwise detected by the touch display, such as a result of the proximity of the selection object to the touch display.
- a touch display may be capable of receiving information associated with force applied to the touch screen in relation to the touch input.
- the touch screen may differentiate between a heavy press touch input and a light press touch input.
- a display may display two-dimensional information, three-dimensional information and/or the like.
- the media capturing element may be any means for capturing an image, video, and/or audio for storage, display or transmission.
- the camera module may comprise a digital camera which may form a digital image file from a captured image.
- the camera module may comprise hardware, such as a lens or other optical component (s) , and/or software necessary for creating a digital image file from a captured image.
- the camera module may comprise only the hardware for viewing an image, while a memory device of the electronic apparatus 10 stores instructions for execution by the processor 11 in the form of software for creating a digital image file from a captured image.
- the camera module may further comprise a processing element such as a co-processor that assists the processor 11 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data.
- the encoder and/or decoder may encode and/or decode according to a standard format, for example, a Joint Photographic Experts Group (JPEG) standard format, a moving picture expert group (MPEG) standard format, a Video Coding Experts Group (VCEG) standard format or any other suitable standard formats.
- JPEG Joint Photographic Experts Group
- MPEG moving picture expert group
- VCEG Video Coding Experts Group
- Figure 2 is a flow chart depicting a process 200 of a training stage of a neural network according to an embodiment of the present disclosure, which may be performed at an apparatus such as the electronic apparatus 10 (for example a distributed system or cloud computing) of Figure 1.
- the electronic apparatus 10 may provide means for accomplishing various parts of the process 200 as well as means for accomplishing other processes in conjunction with other components.
- the neural network may comprise a factor removal neural network, a feature extraction neural network and a classification neural network.
- the factor removal neural network may be used to remove a factor from the input/sample of the neural network.
- the factor removal neural network may be any suitable factor removal neural network for example depending on the factor to be removed. In generally, there may be a specific factor removal neural network for each factor. In other words, there may be n factor removal neural networks if there are n factors to be removed.
- the feature extraction neural network may be used to extract feature.
- the classification neural network may be used for classification.
- the feature extraction neural network may be any suitable feature extraction neural network for example depending on the feature to be extracted.
- the classification neural network may be any suitable classification neural network for example depending on the feature to be classified.
- Each of the factor removal neural network, the feature extraction neural network and the classification neural network may comprises k layers, wherein k ⁇ 3.
- FIG 4 schematically shows a neural network 400 used for the training stage according to an embodiment of the disclosure, wherein the neural network 400 can be used to process a sample degraded by 2 factors.
- the neural network 400 may comprises three parts: a factor removal part 402, a feature extraction part 404 and a classification part 406. It is noted that the neural network 400 can be easily expanded to any other neural network which can process the sample degraded by more than 2 factors. The process 200 will be described in detail with reference to Figures 2 and 4.
- the process 200 may start at block 202 where the parameters/weights of the neural network 400 (the factor removal neural network, the feature extraction neural network and the classification neural network) are initialized with for example random values. Parameters like the number of filters, filter sizes, architecture of the network etc. have all been fixed before block 202 and do not change during the training stage.
- the electronic apparatus 10 receives a set of pairs of training samples with labels, wherein each pair of training samples contains first training sample degraded by n ⁇ 2 factors and second training sample where a factor j ⁇ [1, ..., n] of the n factors does not degrade the first training sample.
- the training sample may be any suitable sample which can be processed by the neural network 400, such as image, audio or text.
- the label may indicate the classification of the training sample.
- the set of pairs of training samples with labels may be pre-stored in a memory of the electronic apparatus 10, or retrieved from a network location or a local location.
- the factors may be any factors which can degrade the sample such as image, audio, text or any other suitable sample.
- the training sample is the image, and the factors may comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
- a pair of training samples contains the first training sample degraded by the 2 factors and the second training sample where a factor j ⁇ [1, 2]does not degrade the first training sample.
- the first training sample (such as an image) shown by Input is input respectively to the factor removal neural networks 410 and 408 and the second training sample shown by Ground truth1 and Ground truth2 may be stored in the neural network 400, wherein the Ground truth1 stands for a second training sample where factor 1 does not degrade the first training sample, and Ground truth2 stands for a second training sample where factor 2 does not degrade the first training sample.
- the electronic apparatus 10 may perform the following operations for the factor j of each pair of training samples: remove the factor j from the first training sample by the factor removal neural network; compute residual information R j corresponding to the factor j based on the first training sample and the output C j of the factor removal neural network; compute loss L j of the factor removal neural network based on the difference between the output C j and the second training sample; compute a difference D j between the output C j and a sum of residual information for n-1 factor (s) except j; extract feature from the difference D j by the feature extraction neural network.
- factors 1 and 2 may be removed respectively from the first training sample by the factor removal neural networks 408 and 410.
- the factor removal neural networks 408 and 410 may be different neural networks each of which is suitable for removing a specific factor from the first training sample.
- the factor removal neural networks 408 and 410 may each contain a plurality of layers.
- the number of layers of the factor removal neural networks 408 and 410 may be different though the same number of layers m is shown in Figure 4.
- the layers of the factor removal neural networks 408 are denoted by and the layers of the factor removal neural networks 410 are denoted by stands for the estimated sample where factor 1 has been removed from the first training sample. stands for the estimated sample where factor 2 has been removed from the first training sample.
- residual information R 1 corresponding to the factor 1 is computed by subtracting the output of the factor removal neural network 408 from the first training sample.
- residual information R 2 corresponding to the factor 2 is computed by subtracting the output of the factor removal neural network 410 from the first training sample.
- the loss Loss1 of the factor removal neural network 408 may be computed by subtracting the output of the factor removal neural network 408 from the second training sample Ground truth1.
- the loss Loss2 of the factor removal neural network 410 may be computed by subtracting the output of the factor removal neural network 410 from the second training sample Ground truth2.
- the difference and may stand for a sample where both factor 1 and factor 2 are removed from the first training sample.
- the difference and may be input to the feature extraction neural network 412 and 414 to extract feature from the difference and
- the output of the feature extraction neural network 412 may stand for the feature extracted from the difference and output of the feature extraction neural network 414 may stand for the feature extracted from the difference
- the feature extraction neural network 412 and 414 may be the same feature extraction neural network.
- the feature extraction neural network 412 and 414 may each contain a plurality of layers.
- the layers of the feature extraction neural network 412 are denoted by and the layers of the feature extraction neural network 414 are denoted by
- the feature extraction neural network 412 and 414 may be any suitable feature extraction neural network for example depending on the features to be extracted.
- the electronic apparatus 10 may stack n features to input to the classification neural network.
- the feature layer and the feature layer are stacked to form the first layer E 1 of the classification neural network 416.
- the stacking operation may further comprise convolution operation, activation operation and pooling operation.
- the classification neural network 416 may comprise k layers denoted by E 1 , E 2 , ..., E k , wherein k ⁇ 3.
- the classification neural network 416 may be any suitable classification neural network for example depending on the features to be classified.
- the last layer E k of the classification neural network 416 is the classification/detection result.
- the electronic apparatus 10 may compute classification loss based on the result of classification neural network and the label of the first training sample. For example, as shown in Figure 4, the electronic apparatus 10 may compute the classification loss at block 418.
- the electronic apparatus 10 may add n losses L j and the classification loss to form joint loss.
- the electronic apparatus 10 may add Loss1, Loss2 and the classification loss to form the joint loss at block 420.
- the electronic apparatus 10 may learn the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network by minimizing the joint loss with the standard back-propagation algorithm. It is noted that the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network may be learned by minimizing the classification loss with the standard back-propagation algorithm in other embodiments, and in this case, the computation of Loss1 and Loss2, and the adding operation may be omitted.
- FIG 3 is a flow chart depicting a process 300 of a testing stage of a neural network according to embodiments of the present disclosure, which may be performed at an apparatus such as the electronic apparatus 10 (for example an advanced driver assistance system (ADAS) or a self-driving apparatus) of Figure 1.
- the electronic apparatus 10 may provide means for accomplishing various parts of the process 300 as well as means for accomplishing other processes in conjunction with other components.
- the neural network has been trained by using the process 200 of Figure 2.
- the neural network may comprise a factor removal neural network, a feature extraction neural network and a classification neural network.
- Figure 5 schematically shows a neural network 500 used for the testing stage according to an embodiment of the disclosure, wherein the neural network 500 can be used to process a sample degraded by 2 factors.
- the neural network 500 may comprises three parts: a factor removal part 502, a feature extraction part 504 and a classification part 506. It is noted that the neural network 500 can be easily expanded to any other neural network which can process a sample degraded by more than 2 factors.
- the process 300 will be described in detail with reference to Figures 3 and 5.
- the process 300 may start at block 302 where the parameters/weights of the neural network 500 (the factor removal neural network, the feature extraction neural network and the classification neural network) are initialized with the values obtained in the taining stage.
- the parameters/weights of the neural network 500 the factor removal neural network, the feature extraction neural network and the classification neural network
- the electronic apparatus 10 receives a sample degraded by n factors, wherein n ⁇ 2.
- the sample may be any suitable sample which can be processed by the neural network 500, such as image, audio or text.
- the sample may be pre-stored in a memory of the electronic apparatus 10, retrieved from a network location or a local location, or captured in real time for example by the ADAS/autonomous vehicle.
- the sample is the image, and the factors may comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
- the sample (such as an image) shown by Input is input respectively to the factor removal neural networks 510 and 508.
- the electronic apparatus 10 may perform the following operations for each factor i ⁇ [1, ..., n] of n factors: remove the factor i from the sample by a factor removal neural network; compute residual information R i corresponding to the factor i based on the sample and the output C i of the factor removal neural network; compute a difference D i between the output C i and a sum of residual information for n-1 factor (s) except the factor i; extract feature from the difference D i by the feature extraction neural network.
- factors 1 and 2 may be removed respectively from the sample by the factor removal neural networks 508 and 510.
- the factor removal neural networks 508 and 510 may be different neural networks each of which is suitable for removing a specific factor from the sample.
- the factor removal neural networks 508 and 510 may each contain a plurality of layers.
- the number of layers of the factor removal neural networks 508 and 510 may be different though the same number of layers m is shown in Figure 5.
- the layers of the factor removal neural networks 508 are denoted by and the layers of the factor removal neural networks 510 are denoted by stands for the estimated sample where factor 1 has been removed from the sample. stands for the estimated sample where factor 2 has been removed from the sample.
- residual information R 1 corresponding to the factor 1 is computed by subtracting the output of the factor removal neural network 508 from the sample.
- Residual information R 2 corresponding to the factor 2 is computed by subtracting the output of the factor removal neural network 510 from the sample.
- the difference and may stand for a processed sample where both factor 1 and factor 2 are removed from the sample.
- the difference and may be input to the feature extraction neural network 512 and 514 to extract feature from the difference and
- the output of the feature extraction neural network 512 may stand for the feature extracted from the difference and the output of the feature extraction neural network 514 may stand for the feature extracted from the difference
- the feature extraction neural network 512 and 514 may be the same feature extraction neural network.
- the feature extraction neural network 512 and 514 may each contain a plurality of layers. The layers of the feature extraction neural network 512 are denoted by and the layers of the feature extraction neural network 514 are denoted by In other embodiment, there may be one feature extraction neural network in the neural network 500 and each difference may be sequentially input to the feature extraction neural network.
- the feature extraction neural network 512 and 514 may be any suitable feature extraction neural network for example depending on the features to be extracted.
- the electronic apparatus 10 may stack n features to input to the classification neural network.
- the feature layer and the feature layer are stacked to form the first layer E 1 of the classification neural network 516.
- the stacking operation may further comprise convolution operation, activation operation and pooling operation.
- the classification neural network 516 may comprise k layers denoted by E 1 , E 2 , ..., E k , wherein k ⁇ 3.
- the classification neural network 516 may be any suitable classification neural network for example depending on the features to be classified.
- the last layer E k of the classification neural network 516 may be outputted as a detection/classification result at block 310.
- the process 300 may be used in the ADAS/autonomous vehicle, such as for object detection.
- a vision system is equipped with the ADAS or autonomous vehicle.
- the process 300 can be integrated into the vision system.
- an image is captured by a camera and the important objects such as pedestrians and bicycles are detected from the image by the process 300.
- some forms (e.g., warning voice) of warning may be generated if important objects (e.g., pedestrians) are detected so that the driver in the vehicle can pay attention to the objects and try to avoid traffic accident.
- the detected objects may be used as inputs of a control module and the control module takes proper action according to the objects.
- the method constructs a neural network (such as a deep convolutional neural network) which greatly improves the performance of object detection systems, wherein a sample which is input to the object detection systems is degraded by at least two factors.
- the restoration residual such as R 1 in Figure 4 corresponding to one factor is used to deal with the sample degraded by another factor, which greatly weaken the negative influence of the factors of the sample.
- the adverse factor removal, feature extraction, and classification are jointly performed under the framework of a neural network such as the deep convolutional neural network.
- an apparatus for object detection may comprise means configured to carry out the processes described above.
- the apparatus comprises means configured to receive a sample degraded by at least two factors; means configured to perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to a classification neural network; and means configured to output the result of classification neural network as a detection result.
- the apparatus further comprises means configured to train the factor removal neural network, the feature extraction neural network and the classification neural network.
- the apparatus further comprises means configured to receive a set of pairs of training samples with labels, wherein each pair of training samples contains first training sample degraded by the at least two factors and second training sample where a factor of the at least two factors does not degrade the first training sample; means configured to perform the following operations for the factor of each pair of training samples: remove the factor from the first training sample by the factor removal neural network; compute residual information corresponding to the factor based on the first training sample and the output of the factor removal neural network; compute a loss of the factor removal neural network based on the difference between the output and the second training sample; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by the feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to the classification neural network; means configured to compute a classification loss based on the result of classification neural network and the label of the first training sample; means configured to add the loss of each factor removal neural network and the classification loss to form joint loss;
- the sample and the training sample comprise one of image, audio and text.
- the sample and the training sample is the image, and the factors comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
- the apparatus is used in an advanced driver assistance system/autonomous vehicle.
- the neural network comprises a convolutional neural network.
- any of the components of the apparatus described above can be implemented as hardware or software modules.
- software modules they can be embodied on a tangible computer-readable recordable storage medium. All of the software modules (or any subset thereof) can be on the same medium, or each can be on a different medium, for example.
- the software modules can run, for example, on a hardware processor. The method steps can then be carried out using the distinct software modules, as described above, executing on a hardware processor.
- an aspect of the disclosure can make use of software running on a general purpose computer or workstation.
- a general purpose computer or workstation Such an implementation might employ, for example, a processor, a memory, and an input/output interface formed, for example, by a display and a keyboard.
- the term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry. Further, the term “processor” may refer to more than one individual processor.
- memory is intended to include memory associated with a processor or CPU, such as, for example, RAM (random access memory) , ROM (read only memory) , a fixed memory device (for example, hard drive) , a removable memory device (for example, diskette) , a flash memory and the like.
- the processor, memory, and input/output interface such as display and keyboard can be interconnected, for example, via bus as part of a data processing unit. Suitable interconnections, for example via bus, can also be provided to a network interface, such as a network card, which can be provided to interface with a computer network, and to a media interface, such as a diskette or CD-ROM drive, which can be provided to interface with media.
- computer software including instructions or code for performing the methodologies of the disclosure, as described herein, may be stored in associated memory devices (for example, ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (for example, into RAM) and implemented by a CPU.
- Such software could include, but is not limited to, firmware, resident software, microcode, and the like.
- aspects of the disclosure may take the form of a computer program product embodied in a computer readable medium having computer readable program code embodied thereon.
- computer readable media may be a computer readable signal medium or a computer readable storage medium.
- a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
- a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
- Computer program code for carrying out operations for aspects of the disclosure may be written in any combination of at least one programming language, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C"programming language or similar programming languages.
- the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- each block in the flowchart or block diagrams may represent a module, component, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function (s) .
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- connection or coupling means any connection or coupling, either direct or indirect, between two or more elements, and may encompass the presence of one or more intermediate elements between two elements that are “connected” or “coupled” together.
- the coupling or connection between the elements can be physical, logical, or a combination thereof.
- two elements may be considered to be “connected” or “coupled” together by the use of one or more wires, cables and/or printed electrical connections, as well as by the use of electromagnetic energy, such as electromagnetic energy having wavelengths in the radio frequency region, the microwave region and the optical region (both visible and invisible) , as several non-limiting and non-exhaustive examples.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Biodiversity & Conservation Biology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
L'invention concerne un appareil, un procédé, un produit-programme d'ordinateur et un support lisible par ordinateur pour une détection d'objet. Le procédé consiste : à recevoir un échantillon dégradé par au moins deux facteurs (304) ; à effectuer les opérations suivantes pour chaque facteur desdits facteurs consistant : à supprimer un facteur desdits facteurs de l'échantillon au moyen d'un réseau neuronal de suppression de facteur ; à calculer des informations résiduelles correspondant au facteur sur la base de l'échantillon et de la sortie du réseau neuronal de suppression de facteur ; à calculer une différence entre la sortie et une somme d'informations résiduelles pour tous les autres facteurs à l'exception du facteur ; à extraire une caractéristique de la différence au moyen d'un réseau neuronal d'extraction de caractéristique ; à empiler la caractéristique extraite au moyen de chaque réseau neuronal d'extraction de caractéristique pour une entrée dans un réseau neuronal de classification (308) ; à sortir le résultat du réseau neuronal de classification en tant que résultat de détection (310).
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2017/071477 WO2018132961A1 (fr) | 2017-01-18 | 2017-01-18 | Appareil, procédé et produit-programme d'ordinateur pour une détection d'objet |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| PCT/CN2017/071477 WO2018132961A1 (fr) | 2017-01-18 | 2017-01-18 | Appareil, procédé et produit-programme d'ordinateur pour une détection d'objet |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018132961A1 true WO2018132961A1 (fr) | 2018-07-26 |
Family
ID=62907723
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/071477 Ceased WO2018132961A1 (fr) | 2017-01-18 | 2017-01-18 | Appareil, procédé et produit-programme d'ordinateur pour une détection d'objet |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2018132961A1 (fr) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109697464A (zh) * | 2018-12-17 | 2019-04-30 | 环球智达科技(北京)有限公司 | 基于物体检测和特征搜索的精确目标识别的方法及系统 |
| CN110570371A (zh) * | 2019-08-28 | 2019-12-13 | 天津大学 | 一种基于多尺度残差学习的图像去雾方法 |
| CN112132169A (zh) * | 2019-06-25 | 2020-12-25 | 富士通株式会社 | 信息处理装置和信息处理方法 |
| CN112184590A (zh) * | 2020-09-30 | 2021-01-05 | 西安理工大学 | 一种基于灰度世界自引导网络的单幅沙尘图像恢复方法 |
| CN114283350A (zh) * | 2021-09-17 | 2022-04-05 | 腾讯科技(深圳)有限公司 | 视觉模型训练和视频处理方法、装置、设备及存储介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103914813A (zh) * | 2014-04-10 | 2014-07-09 | 西安电子科技大学 | 彩色雾霾图像去雾与光照补偿的复原方法 |
| US20160078605A1 (en) * | 2014-09-16 | 2016-03-17 | National Taipei University Of Technology | Image restoration method and image processing apparatus using the same |
| CN105844257A (zh) * | 2016-04-11 | 2016-08-10 | 吉林大学 | 基于机器视觉雾天行车错失道路标志牌的预警系统及方法 |
-
2017
- 2017-01-18 WO PCT/CN2017/071477 patent/WO2018132961A1/fr not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN103914813A (zh) * | 2014-04-10 | 2014-07-09 | 西安电子科技大学 | 彩色雾霾图像去雾与光照补偿的复原方法 |
| US20160078605A1 (en) * | 2014-09-16 | 2016-03-17 | National Taipei University Of Technology | Image restoration method and image processing apparatus using the same |
| CN105844257A (zh) * | 2016-04-11 | 2016-08-10 | 吉林大学 | 基于机器视觉雾天行车错失道路标志牌的预警系统及方法 |
Non-Patent Citations (2)
| Title |
|---|
| HE KAIMING ET AL.: "Deep Residual Learning for Image Recognition", 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 31 December 2016 (2016-12-31), XP055353100 * |
| TIAN YU ET AL.: "Adaptive Optics Images Restoration Based on Frame Selection and Multi- Frame Blind Deconvolution", ACTA ASTRONOMICA SINICA, vol. 49, no. 4, 31 October 2008 (2008-10-31), XP026091076, ISSN: 0001-5245, Retrieved from the Internet <URL:https://doi.org/10.1016/j.chinastron.2009.03.004> * |
Cited By (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109697464A (zh) * | 2018-12-17 | 2019-04-30 | 环球智达科技(北京)有限公司 | 基于物体检测和特征搜索的精确目标识别的方法及系统 |
| CN112132169A (zh) * | 2019-06-25 | 2020-12-25 | 富士通株式会社 | 信息处理装置和信息处理方法 |
| CN112132169B (zh) * | 2019-06-25 | 2023-08-04 | 富士通株式会社 | 信息处理装置和信息处理方法 |
| CN110570371A (zh) * | 2019-08-28 | 2019-12-13 | 天津大学 | 一种基于多尺度残差学习的图像去雾方法 |
| CN110570371B (zh) * | 2019-08-28 | 2023-08-29 | 天津大学 | 一种基于多尺度残差学习的图像去雾方法 |
| CN112184590A (zh) * | 2020-09-30 | 2021-01-05 | 西安理工大学 | 一种基于灰度世界自引导网络的单幅沙尘图像恢复方法 |
| CN112184590B (zh) * | 2020-09-30 | 2024-03-26 | 西安理工大学 | 一种基于灰度世界自引导网络的单幅沙尘图像恢复方法 |
| CN114283350A (zh) * | 2021-09-17 | 2022-04-05 | 腾讯科技(深圳)有限公司 | 视觉模型训练和视频处理方法、装置、设备及存储介质 |
| CN114283350B (zh) * | 2021-09-17 | 2024-06-07 | 腾讯科技(深圳)有限公司 | 视觉模型训练和视频处理方法、装置、设备及存储介质 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109635621B (zh) | 用于第一人称视角中基于深度学习识别手势的系统和方法 | |
| WO2019222951A1 (fr) | Procédé et appareil de vision artificielle | |
| US11443438B2 (en) | Network module and distribution method and apparatus, electronic device, and storage medium | |
| US8823798B2 (en) | Obscuring identification information in an image of a vehicle | |
| CN108229324B (zh) | 手势追踪方法和装置、电子设备、计算机存储介质 | |
| WO2018132961A1 (fr) | Appareil, procédé et produit-programme d'ordinateur pour une détection d'objet | |
| US20200380263A1 (en) | Detecting key frames in video compression in an artificial intelligence semiconductor solution | |
| US11386287B2 (en) | Method and apparatus for computer vision | |
| WO2017074786A1 (fr) | Système et procédé pour la détection automatique d'un contenu vidéo sphérique | |
| WO2022166625A1 (fr) | Procédé de poussée d'informations dans un scénario de déplacement de véhicule et appareil associé | |
| WO2018002436A1 (fr) | Procédé et appareil permettant de supprimer des objets troubles dans une image | |
| WO2017197593A1 (fr) | Appareil, procédé et produit-programme informatique permettant la récupération d'une diapositive modifiable | |
| CN112396060B (zh) | 基于身份证分割模型的身份证识别方法及其相关设备 | |
| CN112287945A (zh) | 碎屏确定方法、装置、计算机设备及计算机可读存储介质 | |
| CN107886110A (zh) | 人脸检测方法、装置及电子设备 | |
| CN110121719A (zh) | 用于深度学习的装置、方法和计算机程序产品 | |
| CN107516295A (zh) | 去除图像中的噪声的方法和装置 | |
| CN108304840B (zh) | 一种图像数据处理方法以及装置 | |
| CN116434173B (zh) | 道路图像检测方法、装置、电子设备及存储介质 | |
| CN118451417A (zh) | 基于大模型的交通盗抢识别方法、装置、系统及存储介质 | |
| CN117557930A (zh) | 基于航拍图像的火灾识别方法和火灾识别装置 | |
| CN117113231A (zh) | 基于移动终端的多模态低头族危险环境感知与预警方法 | |
| CN113627243B (zh) | 一种文本识别方法及相关装置 | |
| CN113628148A (zh) | 红外图像降噪方法和装置 | |
| CN119027908B (zh) | 多尺度交通信号灯检测识别方法、装置、设备和存储介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17893456 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 17893456 Country of ref document: EP Kind code of ref document: A1 |