[go: up one dir, main page]

WO2018132961A1 - Apparatus, method and computer program product for object detection - Google Patents

Apparatus, method and computer program product for object detection Download PDF

Info

Publication number
WO2018132961A1
WO2018132961A1 PCT/CN2017/071477 CN2017071477W WO2018132961A1 WO 2018132961 A1 WO2018132961 A1 WO 2018132961A1 CN 2017071477 W CN2017071477 W CN 2017071477W WO 2018132961 A1 WO2018132961 A1 WO 2018132961A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
factor
sample
classification
factors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2017/071477
Other languages
French (fr)
Inventor
Jiale CAO
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Technologies Beijing Co Ltd
Nokia Technologies Oy
Original Assignee
Nokia Technologies Beijing Co Ltd
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Beijing Co Ltd, Nokia Technologies Oy filed Critical Nokia Technologies Beijing Co Ltd
Priority to PCT/CN2017/071477 priority Critical patent/WO2018132961A1/en
Publication of WO2018132961A1 publication Critical patent/WO2018132961A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • Embodiments of the disclosure generally relate to information technologies, and, more particularly, to object detection.
  • object detection plays an important role in most applications.
  • object detection systems are broadly used for computer vision, automatic speech recognition, natural language processing, drug discovery and toxicology, customer relationship management, recommendation systems, and biomedical Informatics.
  • the object detection systems can be used in video surveillance, traffic surveillance, driver assistant systems, autonomous vehicle, traffic monitoring, human identification, human-computer interaction, public security, event detection, tracking, frontier guards and the Customs, scenario analysis and classification, image indexing and retrieve, etc.
  • the input/sample of object detection systems may be degraded by at least two factors which may greatly influence the performance of the object detection systems.
  • an image captured by the driver assistant system may be degraded by at least two of haze, rain, fog, sand, dust, sand storm, hailstone, dark light, etc.
  • haze and dark light are two common sources of degrading image quality. They hamper the visibility of the scene and its objects. The intensity, hue and saturation of the scene and its objects are also altered by the haze and dark light. The performance of the driver assistant system degrades drastically when complex and challenging weather occurs.
  • the apparatus may comprise at least one processor; and at least one memory including computer program code, the memory and the computer program code configured to, working with the at least one processor, cause the apparatus to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
  • the method may comprise receiving a sample degraded by at least two factors; performing the following operations for each factor of the at least two factors: removing a factor of the at least two factors from the sample by a factor removal neural network; computing residual information corresponding to the factor based on the sample and the output of the factor removal neural network; computing a difference between the output and a sum of residual information for all the other factor (s) except the factor; extracting a feature from the difference by a feature extraction neural network; stacking the feature extracted by each feature extraction neural network to input to a classification neural network; and outputting the result of classification neural network as a detection result.
  • a computer program product embodied on a distribution medium readable by a computer and comprising program instructions which, when loaded into a computer, causes a processor to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
  • a non-transitory computer readable medium having encoded thereon statements and instructions to cause a processor to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
  • an apparatus comprising means configured to receive a sample degraded by at least two factors; means configured to perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to a classification neural network; and means configured to output the result of classification neural network as a detection result.
  • Figure 1 is a simplified block diagram showing an apparatus according to an embodiment
  • Figure 2 is a flow chart depicting a process of a training stage of a neural network according to an embodiment of the present disclosure
  • Figure 3 is a flow chart depicting a process of a testing stage of a neural network according to embodiments of the present disclosure
  • Figure 4 schematically shows a neural network used for the training stage according to an embodiment of the disclosure.
  • Figure 5 schematically shows a neural network used for the testing stage according to an embodiment of the disclosure.
  • the term 'circuitry' refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry) ; (b) combinations of circuits and computer program product (s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor (s) or a portion of a microprocessor (s) , that require software or firmware for operation even if the software or firmware is not physically present.
  • This definition of 'circuitry'a pplies to all uses of this term herein, including in any claims.
  • the term 'circuitry'a lso includes an implementation comprising one or more processors and/or portion (s) thereof and accompanying software and/or firmware.
  • the term 'circuitry'a s used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network apparatus, other network apparatus, and/or other computing apparatus.
  • non-transitory computer-readable medium which refers to a physical medium (e.g., volatile or non-volatile memory device)
  • the embodiments are mainly described in the context of deep convolutional neural network, they are not limited to this but can be applied to any suitable neural network. Moreover, the embodiments of the disclosure can be applied to automatic speech recognition, natural language processing, drug discovery and toxicology, customer relationship management, recommendation systems, and biomedical Informatics, etc., though they are mainly discussed in the context of image recognition.
  • FIG. 1 is a simplified block diagram showing an apparatus, such as an electronic apparatus 10, in which various embodiments of the disclosure may be applied. It should be understood, however, that the electronic apparatus as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the disclosure and, therefore, should not be taken to limit the scope of the disclosure. While the electronic apparatus 10 is illustrated and will be hereinafter described for purposes of example, other types of apparatuses may readily employ embodiments of the disclosure.
  • the electronic apparatus 10 may be a portable digital assistant (PDAs) , a user equipment, a mobile computer, a desktop computer, a smart television, an intelligent glass, a gaming apparatus, a laptop computer, a media player, a camera, a video recorder, a mobile phone, a global positioning system (GPS) apparatus, a smart phone, a tablet, a server, a thin client, a cloud computer, a virtual server, a set-top box, a computing device, a distributed system, a smart glass, a vehicle navigation system, an advanced driver assistance systems (ADAS) , a self-driving apparatus, a video surveillance apparatus, an intelligent robotics, a virtual reality apparatus and/or any other types of electronic systems.
  • PDAs portable digital assistant
  • a user equipment a mobile computer
  • desktop computer a smart television
  • an intelligent glass a gaming apparatus
  • a laptop computer a media player
  • a camera a video recorder
  • a mobile phone a global positioning system (GPS
  • the electronic apparatus 10 may run with any kind of operating system including, but not limited to, Windows, Linux, UNIX, Android, iOS and their variants. Moreover, the apparatus of at least one example embodiment need not to be the entire electronic apparatus, but may be a component or group of components of the electronic apparatus in other example embodiments.
  • the electronic apparatus may readily employ embodiments of the disclosure regardless of their intent to provide mobility.
  • embodiments of the disclosure may be utilized in conjunction with a variety of applications.
  • the electronic apparatus 10 may comprise processor 11 and memory 12.
  • Processor 11 may be any type of processor, controller, embedded controller, processor core, graphics processing unit (GPU) and/or the like.
  • processor 11 utilizes computer program code to cause an apparatus to perform one or more actions.
  • Memory 12 may comprise volatile memory, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data and/or other memory, for example, non-volatile memory, which may be embedded and/or may be removable.
  • RAM volatile Random Access Memory
  • non-volatile memory may comprise an EEPROM, flash memory and/or the like.
  • Memory 12 may store any of a number of pieces of information, and data.
  • memory 12 includes computer program code such that the memory and the computer program code are configured to, working with the processor, cause the apparatus to perform one or more actions described herein.
  • the electronic apparatus 10 may further comprise a communication device 15.
  • communication device 15 comprises an antenna, (or multiple antennae) , a wired connector, and/or the like in operable communication with a transmitter and/or a receiver.
  • processor 11 provides signals to a transmitter and/or receives signals from a receiver.
  • the signals may comprise signaling information in accordance with a communications interface standard, user speech, received data, user generated data, and/or the like.
  • Communication device 15 may operate with one or more air interface standards, communication protocols, modulation types, and access types.
  • the electronic communication device 15 may operate in accordance with second-generation (2G) wireless communication protocols IS-136 (time division multiple access (TDMA) ) , Global System for Mobile communications (GSM) , and IS-95 (code division multiple access (CDMA) ) , with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS) , CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA) , and/or with fourth-generation (4G) wireless communication protocols, wireless networking protocols, such as 802.11, short-range wireless protocols, such as Bluetooth, and/or the like.
  • Communication device 15 may operate in accordance with wireline protocols, such as Ethernet, digital subscriber line (DSL) , and/or the like.
  • Processor 11 may comprise means, such as circuitry, for implementing audio, video, communication, navigation, logic functions, and/or the like, as well as for implementing embodiments of the disclosure including, for example, one or more of the functions described herein.
  • processor 11 may comprise means, such as a digital signal processor device, a microprocessor device, various analog to digital converters, digital to analog converters, processing circuitry and other support circuits, for performing various functions including, for example, one or more of the functions described herein.
  • the apparatus may perform control and signal processing functions of the electronic apparatus 10 among these devices according to their respective capabilities.
  • the processor 11 thus may comprise the functionality to encode and interleave message and data prior to modulation and transmission.
  • the processor 11 may additionally comprise an internal voice coder, and may comprise an internal data modem. Further, the processor 11 may comprise functionality to operate one or more software programs, which may be stored in memory and which may, among other things, cause the processor 11 to implement at least one embodiment including, for example, one or more of the functions described herein. For example, the processor 11 may operate a connectivity program, such as a conventional internet browser.
  • the connectivity program may allow the electronic apparatus 10 to transmit and receive internet content, such as location-based content and/or other web page content, according to a Transmission Control Protocol (TCP) , Internet Protocol (IP) , User Datagram Protocol (UDP) , Internet Message Access Protocol (IMAP) , Post Office Protocol (POP) , Simple Mail Transfer Protocol (SMTP) , Wireless Application Protocol (WAP) , Hypertext Transfer Protocol (HTTP) , and/or the like, for example.
  • TCP Transmission Control Protocol
  • IP Internet Protocol
  • UDP User Datagram Protocol
  • IMAP Internet Message Access Protocol
  • POP Post Office Protocol
  • SMTP Simple Mail Transfer Protocol
  • WAP Wireless Application Protocol
  • HTTP Hypertext Transfer Protocol
  • the electronic apparatus 10 may comprise a user interface for providing output and/or receiving input.
  • the electronic apparatus 10 may comprise an output device 14.
  • Output device 14 may comprise an audio output device, such as a ringer, an earphone, a speaker, and/or the like.
  • Output device 14 may comprise a tactile output device, such as a vibration transducer, an electronically deformable surface, an electronically deformable structure, and/or the like.
  • Output Device 14 may comprise a visual output device, such as a display, a light, and/or the like.
  • the electronic apparatus may comprise an input device 13.
  • Input device 13 may comprise a light sensor, a proximity sensor, a microphone, a touch sensor, a force sensor, a button, a keypad, a motion sensor, a magnetic field sensor, a camera, a removable storage device and/or the like.
  • a touch sensor and a display may be characterized as a touch display.
  • the touch display may be configured to receive input from a single point of contact, multiple points of contact, and/or the like.
  • the touch display and/or the processor may determine input based, at least in part, on position, motion, speed, contact area, and/or the like.
  • the electronic apparatus 10 may include any of a variety of touch displays including those that are configured to enable touch recognition by any of resistive, capacitive, infrared, strain gauge, surface wave, optical imaging, dispersive signal technology, acoustic pulse recognition or other techniques, and to then provide signals indicative of the location and other parameters associated with the touch. Additionally, the touch display may be configured to receive an indication of an input in the form of a touch event which may be defined as an actual physical contact between a selection object (e.g., a finger, stylus, pen, pencil, or other pointing device) and the touch display.
  • a selection object e.g., a finger, stylus, pen, pencil, or other pointing device
  • a touch event may be defined as bringing the selection object in proximity to the touch display, hovering over a displayed object or approaching an object within a predefined distance, even though physical contact is not made with the touch display.
  • a touch input may comprise any input that is detected by a touch display including touch events that involve actual physical contact and touch events that do not involve physical contact but that are otherwise detected by the touch display, such as a result of the proximity of the selection object to the touch display.
  • a touch display may be capable of receiving information associated with force applied to the touch screen in relation to the touch input.
  • the touch screen may differentiate between a heavy press touch input and a light press touch input.
  • a display may display two-dimensional information, three-dimensional information and/or the like.
  • the media capturing element may be any means for capturing an image, video, and/or audio for storage, display or transmission.
  • the camera module may comprise a digital camera which may form a digital image file from a captured image.
  • the camera module may comprise hardware, such as a lens or other optical component (s) , and/or software necessary for creating a digital image file from a captured image.
  • the camera module may comprise only the hardware for viewing an image, while a memory device of the electronic apparatus 10 stores instructions for execution by the processor 11 in the form of software for creating a digital image file from a captured image.
  • the camera module may further comprise a processing element such as a co-processor that assists the processor 11 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data.
  • the encoder and/or decoder may encode and/or decode according to a standard format, for example, a Joint Photographic Experts Group (JPEG) standard format, a moving picture expert group (MPEG) standard format, a Video Coding Experts Group (VCEG) standard format or any other suitable standard formats.
  • JPEG Joint Photographic Experts Group
  • MPEG moving picture expert group
  • VCEG Video Coding Experts Group
  • Figure 2 is a flow chart depicting a process 200 of a training stage of a neural network according to an embodiment of the present disclosure, which may be performed at an apparatus such as the electronic apparatus 10 (for example a distributed system or cloud computing) of Figure 1.
  • the electronic apparatus 10 may provide means for accomplishing various parts of the process 200 as well as means for accomplishing other processes in conjunction with other components.
  • the neural network may comprise a factor removal neural network, a feature extraction neural network and a classification neural network.
  • the factor removal neural network may be used to remove a factor from the input/sample of the neural network.
  • the factor removal neural network may be any suitable factor removal neural network for example depending on the factor to be removed. In generally, there may be a specific factor removal neural network for each factor. In other words, there may be n factor removal neural networks if there are n factors to be removed.
  • the feature extraction neural network may be used to extract feature.
  • the classification neural network may be used for classification.
  • the feature extraction neural network may be any suitable feature extraction neural network for example depending on the feature to be extracted.
  • the classification neural network may be any suitable classification neural network for example depending on the feature to be classified.
  • Each of the factor removal neural network, the feature extraction neural network and the classification neural network may comprises k layers, wherein k ⁇ 3.
  • FIG 4 schematically shows a neural network 400 used for the training stage according to an embodiment of the disclosure, wherein the neural network 400 can be used to process a sample degraded by 2 factors.
  • the neural network 400 may comprises three parts: a factor removal part 402, a feature extraction part 404 and a classification part 406. It is noted that the neural network 400 can be easily expanded to any other neural network which can process the sample degraded by more than 2 factors. The process 200 will be described in detail with reference to Figures 2 and 4.
  • the process 200 may start at block 202 where the parameters/weights of the neural network 400 (the factor removal neural network, the feature extraction neural network and the classification neural network) are initialized with for example random values. Parameters like the number of filters, filter sizes, architecture of the network etc. have all been fixed before block 202 and do not change during the training stage.
  • the electronic apparatus 10 receives a set of pairs of training samples with labels, wherein each pair of training samples contains first training sample degraded by n ⁇ 2 factors and second training sample where a factor j ⁇ [1, ..., n] of the n factors does not degrade the first training sample.
  • the training sample may be any suitable sample which can be processed by the neural network 400, such as image, audio or text.
  • the label may indicate the classification of the training sample.
  • the set of pairs of training samples with labels may be pre-stored in a memory of the electronic apparatus 10, or retrieved from a network location or a local location.
  • the factors may be any factors which can degrade the sample such as image, audio, text or any other suitable sample.
  • the training sample is the image, and the factors may comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
  • a pair of training samples contains the first training sample degraded by the 2 factors and the second training sample where a factor j ⁇ [1, 2]does not degrade the first training sample.
  • the first training sample (such as an image) shown by Input is input respectively to the factor removal neural networks 410 and 408 and the second training sample shown by Ground truth1 and Ground truth2 may be stored in the neural network 400, wherein the Ground truth1 stands for a second training sample where factor 1 does not degrade the first training sample, and Ground truth2 stands for a second training sample where factor 2 does not degrade the first training sample.
  • the electronic apparatus 10 may perform the following operations for the factor j of each pair of training samples: remove the factor j from the first training sample by the factor removal neural network; compute residual information R j corresponding to the factor j based on the first training sample and the output C j of the factor removal neural network; compute loss L j of the factor removal neural network based on the difference between the output C j and the second training sample; compute a difference D j between the output C j and a sum of residual information for n-1 factor (s) except j; extract feature from the difference D j by the feature extraction neural network.
  • factors 1 and 2 may be removed respectively from the first training sample by the factor removal neural networks 408 and 410.
  • the factor removal neural networks 408 and 410 may be different neural networks each of which is suitable for removing a specific factor from the first training sample.
  • the factor removal neural networks 408 and 410 may each contain a plurality of layers.
  • the number of layers of the factor removal neural networks 408 and 410 may be different though the same number of layers m is shown in Figure 4.
  • the layers of the factor removal neural networks 408 are denoted by and the layers of the factor removal neural networks 410 are denoted by stands for the estimated sample where factor 1 has been removed from the first training sample. stands for the estimated sample where factor 2 has been removed from the first training sample.
  • residual information R 1 corresponding to the factor 1 is computed by subtracting the output of the factor removal neural network 408 from the first training sample.
  • residual information R 2 corresponding to the factor 2 is computed by subtracting the output of the factor removal neural network 410 from the first training sample.
  • the loss Loss1 of the factor removal neural network 408 may be computed by subtracting the output of the factor removal neural network 408 from the second training sample Ground truth1.
  • the loss Loss2 of the factor removal neural network 410 may be computed by subtracting the output of the factor removal neural network 410 from the second training sample Ground truth2.
  • the difference and may stand for a sample where both factor 1 and factor 2 are removed from the first training sample.
  • the difference and may be input to the feature extraction neural network 412 and 414 to extract feature from the difference and
  • the output of the feature extraction neural network 412 may stand for the feature extracted from the difference and output of the feature extraction neural network 414 may stand for the feature extracted from the difference
  • the feature extraction neural network 412 and 414 may be the same feature extraction neural network.
  • the feature extraction neural network 412 and 414 may each contain a plurality of layers.
  • the layers of the feature extraction neural network 412 are denoted by and the layers of the feature extraction neural network 414 are denoted by
  • the feature extraction neural network 412 and 414 may be any suitable feature extraction neural network for example depending on the features to be extracted.
  • the electronic apparatus 10 may stack n features to input to the classification neural network.
  • the feature layer and the feature layer are stacked to form the first layer E 1 of the classification neural network 416.
  • the stacking operation may further comprise convolution operation, activation operation and pooling operation.
  • the classification neural network 416 may comprise k layers denoted by E 1 , E 2 , ..., E k , wherein k ⁇ 3.
  • the classification neural network 416 may be any suitable classification neural network for example depending on the features to be classified.
  • the last layer E k of the classification neural network 416 is the classification/detection result.
  • the electronic apparatus 10 may compute classification loss based on the result of classification neural network and the label of the first training sample. For example, as shown in Figure 4, the electronic apparatus 10 may compute the classification loss at block 418.
  • the electronic apparatus 10 may add n losses L j and the classification loss to form joint loss.
  • the electronic apparatus 10 may add Loss1, Loss2 and the classification loss to form the joint loss at block 420.
  • the electronic apparatus 10 may learn the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network by minimizing the joint loss with the standard back-propagation algorithm. It is noted that the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network may be learned by minimizing the classification loss with the standard back-propagation algorithm in other embodiments, and in this case, the computation of Loss1 and Loss2, and the adding operation may be omitted.
  • FIG 3 is a flow chart depicting a process 300 of a testing stage of a neural network according to embodiments of the present disclosure, which may be performed at an apparatus such as the electronic apparatus 10 (for example an advanced driver assistance system (ADAS) or a self-driving apparatus) of Figure 1.
  • the electronic apparatus 10 may provide means for accomplishing various parts of the process 300 as well as means for accomplishing other processes in conjunction with other components.
  • the neural network has been trained by using the process 200 of Figure 2.
  • the neural network may comprise a factor removal neural network, a feature extraction neural network and a classification neural network.
  • Figure 5 schematically shows a neural network 500 used for the testing stage according to an embodiment of the disclosure, wherein the neural network 500 can be used to process a sample degraded by 2 factors.
  • the neural network 500 may comprises three parts: a factor removal part 502, a feature extraction part 504 and a classification part 506. It is noted that the neural network 500 can be easily expanded to any other neural network which can process a sample degraded by more than 2 factors.
  • the process 300 will be described in detail with reference to Figures 3 and 5.
  • the process 300 may start at block 302 where the parameters/weights of the neural network 500 (the factor removal neural network, the feature extraction neural network and the classification neural network) are initialized with the values obtained in the taining stage.
  • the parameters/weights of the neural network 500 the factor removal neural network, the feature extraction neural network and the classification neural network
  • the electronic apparatus 10 receives a sample degraded by n factors, wherein n ⁇ 2.
  • the sample may be any suitable sample which can be processed by the neural network 500, such as image, audio or text.
  • the sample may be pre-stored in a memory of the electronic apparatus 10, retrieved from a network location or a local location, or captured in real time for example by the ADAS/autonomous vehicle.
  • the sample is the image, and the factors may comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
  • the sample (such as an image) shown by Input is input respectively to the factor removal neural networks 510 and 508.
  • the electronic apparatus 10 may perform the following operations for each factor i ⁇ [1, ..., n] of n factors: remove the factor i from the sample by a factor removal neural network; compute residual information R i corresponding to the factor i based on the sample and the output C i of the factor removal neural network; compute a difference D i between the output C i and a sum of residual information for n-1 factor (s) except the factor i; extract feature from the difference D i by the feature extraction neural network.
  • factors 1 and 2 may be removed respectively from the sample by the factor removal neural networks 508 and 510.
  • the factor removal neural networks 508 and 510 may be different neural networks each of which is suitable for removing a specific factor from the sample.
  • the factor removal neural networks 508 and 510 may each contain a plurality of layers.
  • the number of layers of the factor removal neural networks 508 and 510 may be different though the same number of layers m is shown in Figure 5.
  • the layers of the factor removal neural networks 508 are denoted by and the layers of the factor removal neural networks 510 are denoted by stands for the estimated sample where factor 1 has been removed from the sample. stands for the estimated sample where factor 2 has been removed from the sample.
  • residual information R 1 corresponding to the factor 1 is computed by subtracting the output of the factor removal neural network 508 from the sample.
  • Residual information R 2 corresponding to the factor 2 is computed by subtracting the output of the factor removal neural network 510 from the sample.
  • the difference and may stand for a processed sample where both factor 1 and factor 2 are removed from the sample.
  • the difference and may be input to the feature extraction neural network 512 and 514 to extract feature from the difference and
  • the output of the feature extraction neural network 512 may stand for the feature extracted from the difference and the output of the feature extraction neural network 514 may stand for the feature extracted from the difference
  • the feature extraction neural network 512 and 514 may be the same feature extraction neural network.
  • the feature extraction neural network 512 and 514 may each contain a plurality of layers. The layers of the feature extraction neural network 512 are denoted by and the layers of the feature extraction neural network 514 are denoted by In other embodiment, there may be one feature extraction neural network in the neural network 500 and each difference may be sequentially input to the feature extraction neural network.
  • the feature extraction neural network 512 and 514 may be any suitable feature extraction neural network for example depending on the features to be extracted.
  • the electronic apparatus 10 may stack n features to input to the classification neural network.
  • the feature layer and the feature layer are stacked to form the first layer E 1 of the classification neural network 516.
  • the stacking operation may further comprise convolution operation, activation operation and pooling operation.
  • the classification neural network 516 may comprise k layers denoted by E 1 , E 2 , ..., E k , wherein k ⁇ 3.
  • the classification neural network 516 may be any suitable classification neural network for example depending on the features to be classified.
  • the last layer E k of the classification neural network 516 may be outputted as a detection/classification result at block 310.
  • the process 300 may be used in the ADAS/autonomous vehicle, such as for object detection.
  • a vision system is equipped with the ADAS or autonomous vehicle.
  • the process 300 can be integrated into the vision system.
  • an image is captured by a camera and the important objects such as pedestrians and bicycles are detected from the image by the process 300.
  • some forms (e.g., warning voice) of warning may be generated if important objects (e.g., pedestrians) are detected so that the driver in the vehicle can pay attention to the objects and try to avoid traffic accident.
  • the detected objects may be used as inputs of a control module and the control module takes proper action according to the objects.
  • the method constructs a neural network (such as a deep convolutional neural network) which greatly improves the performance of object detection systems, wherein a sample which is input to the object detection systems is degraded by at least two factors.
  • the restoration residual such as R 1 in Figure 4 corresponding to one factor is used to deal with the sample degraded by another factor, which greatly weaken the negative influence of the factors of the sample.
  • the adverse factor removal, feature extraction, and classification are jointly performed under the framework of a neural network such as the deep convolutional neural network.
  • an apparatus for object detection may comprise means configured to carry out the processes described above.
  • the apparatus comprises means configured to receive a sample degraded by at least two factors; means configured to perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to a classification neural network; and means configured to output the result of classification neural network as a detection result.
  • the apparatus further comprises means configured to train the factor removal neural network, the feature extraction neural network and the classification neural network.
  • the apparatus further comprises means configured to receive a set of pairs of training samples with labels, wherein each pair of training samples contains first training sample degraded by the at least two factors and second training sample where a factor of the at least two factors does not degrade the first training sample; means configured to perform the following operations for the factor of each pair of training samples: remove the factor from the first training sample by the factor removal neural network; compute residual information corresponding to the factor based on the first training sample and the output of the factor removal neural network; compute a loss of the factor removal neural network based on the difference between the output and the second training sample; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by the feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to the classification neural network; means configured to compute a classification loss based on the result of classification neural network and the label of the first training sample; means configured to add the loss of each factor removal neural network and the classification loss to form joint loss;
  • the sample and the training sample comprise one of image, audio and text.
  • the sample and the training sample is the image, and the factors comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
  • the apparatus is used in an advanced driver assistance system/autonomous vehicle.
  • the neural network comprises a convolutional neural network.
  • any of the components of the apparatus described above can be implemented as hardware or software modules.
  • software modules they can be embodied on a tangible computer-readable recordable storage medium. All of the software modules (or any subset thereof) can be on the same medium, or each can be on a different medium, for example.
  • the software modules can run, for example, on a hardware processor. The method steps can then be carried out using the distinct software modules, as described above, executing on a hardware processor.
  • an aspect of the disclosure can make use of software running on a general purpose computer or workstation.
  • a general purpose computer or workstation Such an implementation might employ, for example, a processor, a memory, and an input/output interface formed, for example, by a display and a keyboard.
  • the term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry. Further, the term “processor” may refer to more than one individual processor.
  • memory is intended to include memory associated with a processor or CPU, such as, for example, RAM (random access memory) , ROM (read only memory) , a fixed memory device (for example, hard drive) , a removable memory device (for example, diskette) , a flash memory and the like.
  • the processor, memory, and input/output interface such as display and keyboard can be interconnected, for example, via bus as part of a data processing unit. Suitable interconnections, for example via bus, can also be provided to a network interface, such as a network card, which can be provided to interface with a computer network, and to a media interface, such as a diskette or CD-ROM drive, which can be provided to interface with media.
  • computer software including instructions or code for performing the methodologies of the disclosure, as described herein, may be stored in associated memory devices (for example, ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (for example, into RAM) and implemented by a CPU.
  • Such software could include, but is not limited to, firmware, resident software, microcode, and the like.
  • aspects of the disclosure may take the form of a computer program product embodied in a computer readable medium having computer readable program code embodied thereon.
  • computer readable media may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Computer program code for carrying out operations for aspects of the disclosure may be written in any combination of at least one programming language, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C"programming language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • each block in the flowchart or block diagrams may represent a module, component, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function (s) .
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • connection or coupling means any connection or coupling, either direct or indirect, between two or more elements, and may encompass the presence of one or more intermediate elements between two elements that are “connected” or “coupled” together.
  • the coupling or connection between the elements can be physical, logical, or a combination thereof.
  • two elements may be considered to be “connected” or “coupled” together by the use of one or more wires, cables and/or printed electrical connections, as well as by the use of electromagnetic energy, such as electromagnetic energy having wavelengths in the radio frequency region, the microwave region and the optical region (both visible and invisible) , as several non-limiting and non-exhaustive examples.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Apparatus, method, computer program product and computer readable medium are disclosed for object detection. The method comprises: receiving a sample degraded by at least two factors (304); performing the following operations for each factor of the at least two factors: removing a factor of the at least two factors from the sample by a factor removal neural network; computing residual information corresponding to the factor based on the sample and the output of the factor removal neural network; computing a difference between the output and a sum of residual information for all the other factor (s) except the factor; extracting a feature from the difference by a feature extraction neural network; stacking the feature extracted by each feature extraction neural network to input to a classification neural network (308); and outputting the result of classification neural network as a detection result (310).

Description

APPARATUS, METHOD AND COMPUTER PROGRAM PRODUCT FOR OBJECT DETECTION Field of the Invention
Embodiments of the disclosure generally relate to information technologies, and, more particularly, to object detection.
Background
Object detection plays an important role in most applications. For example, object detection systems are broadly used for computer vision, automatic speech recognition, natural language processing, drug discovery and toxicology, customer relationship management, recommendation systems, and biomedical Informatics. As an example, in computer vision, the object detection systems can be used in video surveillance, traffic surveillance, driver assistant systems, autonomous vehicle, traffic monitoring, human identification, human-computer interaction, public security, event detection, tracking, frontier guards and the Customs, scenario analysis and classification, image indexing and retrieve, etc.
However, the input/sample of object detection systems may be degraded by at least two factors which may greatly influence the performance of the object detection systems. For example, in a bad weather caused by several factors, an image captured by the driver assistant system may be degraded by at least two of haze, rain, fog, sand, dust, sand storm, hailstone, dark light, etc. As an example, haze and dark light are two common sources of degrading image quality. They hamper the visibility of the scene and its objects. The intensity, hue and saturation of the scene and its objects are also altered by the haze and dark light. The performance of the driver assistant system degrades drastically when complex and challenging weather occurs.
Therefore, it is required a solution for improving the performance of object detection/recognition systems when the input of the object detection/recognition systems is degraded by at least two factors.
Summary
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
According to one aspect of the disclosure, it is provided an apparatus. The apparatus may comprise at least one processor; and at least one memory including computer program code, the memory and the computer program code configured to, working with the at least one processor, cause the apparatus to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
According to another aspect of the present disclosure, it is provided a method. The method may comprise receiving a sample degraded by at least two factors; performing the following operations for each factor of the at least two factors:  removing a factor of the at least two factors from the sample by a factor removal neural network; computing residual information corresponding to the factor based on the sample and the output of the factor removal neural network; computing a difference between the output and a sum of residual information for all the other factor (s) except the factor; extracting a feature from the difference by a feature extraction neural network; stacking the feature extracted by each feature extraction neural network to input to a classification neural network; and outputting the result of classification neural network as a detection result.
According to still another aspect of the present disclosure, it is provided a computer program product embodied on a distribution medium readable by a computer and comprising program instructions which, when loaded into a computer, causes a processor to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
According to still another aspect of the present disclosure, it is provided a non-transitory computer readable medium having encoded thereon statements and instructions to cause a processor to receive a sample degraded by at least two factors; perform the following operations for each factor of the at least two factors: remove a  factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; stack the feature extracted by each feature extraction neural network to input to a classification neural network; and output the result of classification neural network as a detection result.
According to still another aspect of the present disclosure, it is provided an apparatus comprising means configured to receive a sample degraded by at least two factors; means configured to perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to a classification neural network; and means configured to output the result of classification neural network as a detection result.
These and other objects, features and advantages of the disclosure will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
Brief Description of the Drawings
Figure 1 is a simplified block diagram showing an apparatus according to an embodiment;
Figure 2 is a flow chart depicting a process of a training stage of a neural network according to an embodiment of the present disclosure;
Figure 3 is a flow chart depicting a process of a testing stage of a neural network according to embodiments of the present disclosure;
Figure 4 schematically shows a neural network used for the training stage according to an embodiment of the disclosure; and
Figure 5 schematically shows a neural network used for the testing stage according to an embodiment of the disclosure.
Detailed Description
For the purpose of explanation, details are set forth in the following description in order to provide a thorough understanding of the embodiments disclosed. It is apparent, however, to those skilled in the art that the embodiments may be implemented without these specific details or with an equivalent arrangement. Various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like reference numerals refer to like elements throughout. As used herein, the terms "data, " "content, " "information, " and similar terms may be used interchangeably to refer to data capable of being transmitted, received and/or stored in  accordance with embodiments of the present disclosure. Thus, use of any such terms should not be taken to limit the spirit and scope of embodiments of the present disclosure.
Additionally, as used herein, the term 'circuitry'refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry) ; (b) combinations of circuits and computer program product (s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor (s) or a portion of a microprocessor (s) , that require software or firmware for operation even if the software or firmware is not physically present. This definition of 'circuitry'a pplies to all uses of this term herein, including in any claims. As a further example, as used herein, the term 'circuitry'a lso includes an implementation comprising one or more processors and/or portion (s) thereof and accompanying software and/or firmware. As another example, the term 'circuitry'a s used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network apparatus, other network apparatus, and/or other computing apparatus.
As defined herein, a "non-transitory computer-readable medium, " which refers to a physical medium (e.g., volatile or non-volatile memory device) , can be differentiated from a "transitory computer-readable medium, " which refers to an electromagnetic signal.
It is noted that though the embodiments are mainly described in the context of deep convolutional neural network, they are not limited to this but can be applied to  any suitable neural network. Moreover, the embodiments of the disclosure can be applied to automatic speech recognition, natural language processing, drug discovery and toxicology, customer relationship management, recommendation systems, and biomedical Informatics, etc., though they are mainly discussed in the context of image recognition.
Figure 1 is a simplified block diagram showing an apparatus, such as an electronic apparatus 10, in which various embodiments of the disclosure may be applied. It should be understood, however, that the electronic apparatus as illustrated and hereinafter described is merely illustrative of an apparatus that could benefit from embodiments of the disclosure and, therefore, should not be taken to limit the scope of the disclosure. While the electronic apparatus 10 is illustrated and will be hereinafter described for purposes of example, other types of apparatuses may readily employ embodiments of the disclosure. The electronic apparatus 10 may be a portable digital assistant (PDAs) , a user equipment, a mobile computer, a desktop computer, a smart television, an intelligent glass, a gaming apparatus, a laptop computer, a media player, a camera, a video recorder, a mobile phone, a global positioning system (GPS) apparatus, a smart phone, a tablet, a server, a thin client, a cloud computer, a virtual server, a set-top box, a computing device, a distributed system, a smart glass, a vehicle navigation system, an advanced driver assistance systems (ADAS) , a self-driving apparatus, a video surveillance apparatus, an intelligent robotics, a virtual reality apparatus and/or any other types of electronic systems. The electronic apparatus 10 may run with any kind of operating system including, but not limited to, Windows, Linux, UNIX, Android, iOS and their variants. Moreover, the apparatus of at least one example embodiment need not to be the entire electronic apparatus, but may be a component or group of components of the electronic apparatus in other example embodiments.
Furthermore, the electronic apparatus may readily employ embodiments of the disclosure regardless of their intent to provide mobility. In this regard, it should be understood that embodiments of the disclosure may be utilized in conjunction with a variety of applications.
In at least one example embodiment, the electronic apparatus 10 may comprise processor 11 and memory 12. Processor 11 may be any type of processor, controller, embedded controller, processor core, graphics processing unit (GPU) and/or the like. In at least one example embodiment, processor 11 utilizes computer program code to cause an apparatus to perform one or more actions. Memory 12 may comprise volatile memory, such as volatile Random Access Memory (RAM) including a cache area for the temporary storage of data and/or other memory, for example, non-volatile memory, which may be embedded and/or may be removable. The non-volatile memory may comprise an EEPROM, flash memory and/or the like. Memory 12 may store any of a number of pieces of information, and data. The information and data may be used by the electronic apparatus 10 to implement one or more functions of the electronic apparatus 10, such as the functions described herein. In at least one example embodiment, memory 12 includes computer program code such that the memory and the computer program code are configured to, working with the processor, cause the apparatus to perform one or more actions described herein.
The electronic apparatus 10 may further comprise a communication device 15. In at least one example embodiment, communication device 15 comprises an antenna, (or multiple antennae) , a wired connector, and/or the like in operable communication with a transmitter and/or a receiver. In at least one example embodiment, processor 11 provides signals to a transmitter and/or receives signals from a receiver. The signals may comprise signaling information in accordance with a communications interface  standard, user speech, received data, user generated data, and/or the like. Communication device 15 may operate with one or more air interface standards, communication protocols, modulation types, and access types. By way of illustration, the electronic communication device 15 may operate in accordance with second-generation (2G) wireless communication protocols IS-136 (time division multiple access (TDMA) ) , Global System for Mobile communications (GSM) , and IS-95 (code division multiple access (CDMA) ) , with third-generation (3G) wireless communication protocols, such as Universal Mobile Telecommunications System (UMTS) , CDMA2000, wideband CDMA (WCDMA) and time division-synchronous CDMA (TD-SCDMA) , and/or with fourth-generation (4G) wireless communication protocols, wireless networking protocols, such as 802.11, short-range wireless protocols, such as Bluetooth, and/or the like. Communication device 15 may operate in accordance with wireline protocols, such as Ethernet, digital subscriber line (DSL) , and/or the like.
Processor 11 may comprise means, such as circuitry, for implementing audio, video, communication, navigation, logic functions, and/or the like, as well as for implementing embodiments of the disclosure including, for example, one or more of the functions described herein. For example, processor 11 may comprise means, such as a digital signal processor device, a microprocessor device, various analog to digital converters, digital to analog converters, processing circuitry and other support circuits, for performing various functions including, for example, one or more of the functions described herein. The apparatus may perform control and signal processing functions of the electronic apparatus 10 among these devices according to their respective capabilities. The processor 11 thus may comprise the functionality to encode and interleave message and data prior to modulation and transmission. The processor 11 may additionally comprise an internal voice coder, and may comprise an  internal data modem. Further, the processor 11 may comprise functionality to operate one or more software programs, which may be stored in memory and which may, among other things, cause the processor 11 to implement at least one embodiment including, for example, one or more of the functions described herein. For example, the processor 11 may operate a connectivity program, such as a conventional internet browser. The connectivity program may allow the electronic apparatus 10 to transmit and receive internet content, such as location-based content and/or other web page content, according to a Transmission Control Protocol (TCP) , Internet Protocol (IP) , User Datagram Protocol (UDP) , Internet Message Access Protocol (IMAP) , Post Office Protocol (POP) , Simple Mail Transfer Protocol (SMTP) , Wireless Application Protocol (WAP) , Hypertext Transfer Protocol (HTTP) , and/or the like, for example.
The electronic apparatus 10 may comprise a user interface for providing output and/or receiving input. The electronic apparatus 10 may comprise an output device 14. Output device 14 may comprise an audio output device, such as a ringer, an earphone, a speaker, and/or the like. Output device 14 may comprise a tactile output device, such as a vibration transducer, an electronically deformable surface, an electronically deformable structure, and/or the like. Output Device 14 may comprise a visual output device, such as a display, a light, and/or the like. The electronic apparatus may comprise an input device 13. Input device 13 may comprise a light sensor, a proximity sensor, a microphone, a touch sensor, a force sensor, a button, a keypad, a motion sensor, a magnetic field sensor, a camera, a removable storage device and/or the like. A touch sensor and a display may be characterized as a touch display. In an embodiment comprising a touch display, the touch display may be configured to receive input from a single point of contact, multiple points of contact, and/or the like. In such an embodiment, the touch display and/or the processor may  determine input based, at least in part, on position, motion, speed, contact area, and/or the like.
The electronic apparatus 10 may include any of a variety of touch displays including those that are configured to enable touch recognition by any of resistive, capacitive, infrared, strain gauge, surface wave, optical imaging, dispersive signal technology, acoustic pulse recognition or other techniques, and to then provide signals indicative of the location and other parameters associated with the touch. Additionally, the touch display may be configured to receive an indication of an input in the form of a touch event which may be defined as an actual physical contact between a selection object (e.g., a finger, stylus, pen, pencil, or other pointing device) and the touch display. Alternatively, a touch event may be defined as bringing the selection object in proximity to the touch display, hovering over a displayed object or approaching an object within a predefined distance, even though physical contact is not made with the touch display. As such, a touch input may comprise any input that is detected by a touch display including touch events that involve actual physical contact and touch events that do not involve physical contact but that are otherwise detected by the touch display, such as a result of the proximity of the selection object to the touch display. A touch display may be capable of receiving information associated with force applied to the touch screen in relation to the touch input. For example, the touch screen may differentiate between a heavy press touch input and a light press touch input. In at least one example embodiment, a display may display two-dimensional information, three-dimensional information and/or the like.
Input device 13 may comprise a media capturing element. The media capturing element may be any means for capturing an image, video, and/or audio for storage, display or transmission. For example, in at least one example embodiment in  which the media capturing element is a camera module, the camera module may comprise a digital camera which may form a digital image file from a captured image. As such, the camera module may comprise hardware, such as a lens or other optical component (s) , and/or software necessary for creating a digital image file from a captured image. Alternatively, the camera module may comprise only the hardware for viewing an image, while a memory device of the electronic apparatus 10 stores instructions for execution by the processor 11 in the form of software for creating a digital image file from a captured image. In at least one example embodiment, the camera module may further comprise a processing element such as a co-processor that assists the processor 11 in processing image data and an encoder and/or decoder for compressing and/or decompressing image data. The encoder and/or decoder may encode and/or decode according to a standard format, for example, a Joint Photographic Experts Group (JPEG) standard format, a moving picture expert group (MPEG) standard format, a Video Coding Experts Group (VCEG) standard format or any other suitable standard formats.
Figure 2 is a flow chart depicting a process 200 of a training stage of a neural network according to an embodiment of the present disclosure, which may be performed at an apparatus such as the electronic apparatus 10 (for example a distributed system or cloud computing) of Figure 1. As such, the electronic apparatus 10 may provide means for accomplishing various parts of the process 200 as well as means for accomplishing other processes in conjunction with other components.
The neural network may comprise a factor removal neural network, a feature extraction neural network and a classification neural network. The factor removal neural network may be used to remove a factor from the input/sample of the neural network. The factor removal neural network may be any suitable factor removal  neural network for example depending on the factor to be removed. In generally, there may be a specific factor removal neural network for each factor. In other words, there may be n factor removal neural networks if there are n factors to be removed. The feature extraction neural network may be used to extract feature. The classification neural network may be used for classification. The feature extraction neural network may be any suitable feature extraction neural network for example depending on the feature to be extracted. Similarly, the classification neural network may be any suitable classification neural network for example depending on the feature to be classified. Each of the factor removal neural network, the feature extraction neural network and the classification neural network may comprises k layers, wherein k≥3.
Figure 4 schematically shows a neural network 400 used for the training stage according to an embodiment of the disclosure, wherein the neural network 400 can be used to process a sample degraded by 2 factors. As shown in Figure 4, the neural network 400 may comprises three parts: a factor removal part 402, a feature extraction part 404 and a classification part 406. It is noted that the neural network 400 can be easily expanded to any other neural network which can process the sample degraded by more than 2 factors. The process 200 will be described in detail with reference to Figures 2 and 4.
As shown in Figure 2, the process 200 may start at block 202 where the parameters/weights of the neural network 400 (the factor removal neural network, the feature extraction neural network and the classification neural network) are initialized with for example random values. Parameters like the number of filters, filter sizes, architecture of the network etc. have all been fixed before block 202 and do not change during the training stage.
At block 204, the electronic apparatus 10 receives a set of pairs of training samples with labels, wherein each pair of training samples contains first training sample degraded by n≥2 factors and second training sample where a factor j∈ [1, ..., n] of the n factors does not degrade the first training sample. The training sample may be any suitable sample which can be processed by the neural network 400, such as image, audio or text. The label may indicate the classification of the training sample. The set of pairs of training samples with labels may be pre-stored in a memory of the electronic apparatus 10, or retrieved from a network location or a local location.
The factors may be any factors which can degrade the sample such as image, audio, text or any other suitable sample. In an embodiment, the training sample is the image, and the factors may comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
Turn to figure 4, supposing that a pair of training samples contains the first training sample degraded by the 2 factors and the second training sample where a factor j∈ [1, 2]does not degrade the first training sample. The first training sample (such as an image) shown by Input is input respectively to the factor removal  neural networks  410 and 408 and the second training sample shown by Ground truth1 and Ground truth2 may be stored in the neural network 400, wherein the Ground truth1 stands for a second training sample where factor 1 does not degrade the first training sample, and Ground truth2 stands for a second training sample where factor 2 does not degrade the first training sample.
At block 206, the electronic apparatus 10 may perform the following operations for the factor j of each pair of training samples: remove the factor j from the first training sample by the factor removal neural network; compute residual  information Rj corresponding to the factor j based on the first training sample and the output Cj of the factor removal neural network; compute loss Ljof the factor removal neural network based on the difference between the output Cj and the second training sample; compute a difference Dj between the output Cj and a sum of residual information for n-1 factor (s) except j; extract feature from the difference Dj by the feature extraction neural network.
As shown in Figure 4,  factors  1 and 2 may be removed respectively from the first training sample by the factor removal  neural networks  408 and 410. In generally, the factor removal  neural networks  408 and 410 may be different neural networks each of which is suitable for removing a specific factor from the first training sample. The factor removal  neural networks  408 and 410 may each contain a plurality of layers. The number of layers of the factor removal  neural networks  408 and 410 may be different though the same number of layers m is shown in Figure 4. The layers of the factor removal neural networks 408 are denoted by
Figure PCTCN2017071477-appb-000001
and the layers of the factor removal neural networks 410 are denoted by
Figure PCTCN2017071477-appb-000002
stands for the estimated sample where factor 1 has been removed from the first training sample. 
Figure PCTCN2017071477-appb-000003
stands for the estimated sample where factor 2 has been removed from the first training sample.
Then residual information R1 corresponding to the factor 1 is computed by subtracting the output
Figure PCTCN2017071477-appb-000004
of the factor removal neural network 408 from the first training sample. Similarly, residual information R2 corresponding to the factor 2 is  computed by subtracting the output
Figure PCTCN2017071477-appb-000005
of the factor removal neural network 410 from the first training sample.
The loss Loss1 of the factor removal neural network 408 may be computed by subtracting the output
Figure PCTCN2017071477-appb-000006
of the factor removal neural network 408 from the second training sample Ground truth1. Similarly, the loss Loss2 of the factor removal neural network 410 may be computed by subtracting the output
Figure PCTCN2017071477-appb-000007
of the factor removal neural network 410 from the second training sample Ground truth2.
A difference
Figure PCTCN2017071477-appb-000008
between the output
Figure PCTCN2017071477-appb-000009
of the factor removal neural network 408 and a sum of residual information for n-1 factor (s) except factor 1, wherein in this embodiment, the sum of residual information is R2. Similarly, a difference
Figure PCTCN2017071477-appb-000010
between the output
Figure PCTCN2017071477-appb-000011
of the factor removal neural network 410 and a sum of residual information for n-1 factor (s) except factor 2, wherein in this embodiment, the sum of residual information is R1. The difference
Figure PCTCN2017071477-appb-000012
and
Figure PCTCN2017071477-appb-000013
may stand for a sample where both factor 1 and factor 2 are removed from the first training sample.
Then the difference
Figure PCTCN2017071477-appb-000014
and
Figure PCTCN2017071477-appb-000015
may be input to the feature extraction  neural network  412 and 414 to extract feature from the difference
Figure PCTCN2017071477-appb-000016
and
Figure PCTCN2017071477-appb-000017
The output
Figure PCTCN2017071477-appb-000018
of the feature extraction neural network 412 may stand for the feature extracted from the difference
Figure PCTCN2017071477-appb-000019
and output
Figure PCTCN2017071477-appb-000020
of the feature extraction neural network 414 may stand for the feature extracted from the difference
Figure PCTCN2017071477-appb-000021
In generally, the feature extraction  neural network  412 and 414 may be the same feature extraction neural network. The feature extraction  neural network  412 and 414 may each contain a  plurality of layers. The layers of the feature extraction neural network 412 are denoted by
Figure PCTCN2017071477-appb-000022
and the layers of the feature extraction neural network 414 are denoted by
Figure PCTCN2017071477-appb-000023
In other embodiments, there may be one feature extraction neural network in the neural network 400 and each difference may be sequentially input to the feature extraction neural network. In addition, the feature extraction  neural network  412 and 414 may be any suitable feature extraction neural network for example depending on the features to be extracted.
Turn to Figure 2, at block 208, the electronic apparatus 10 may stack n features to input to the classification neural network. For example, as shown in Figure 4, the feature layer
Figure PCTCN2017071477-appb-000024
and the feature layer
Figure PCTCN2017071477-appb-000025
are stacked to form the first layer E1 of the classification neural network 416. The stacking operation may further comprise convolution operation, activation operation and pooling operation. The classification neural network 416 may comprise k layers denoted by E1, E2, …, Ek, wherein k≥3. The classification neural network 416 may be any suitable classification neural network for example depending on the features to be classified. The last layer Ek of the classification neural network 416 is the classification/detection result.
At block 210, the electronic apparatus 10 may compute classification loss based on the result of classification neural network and the label of the first training sample. For example, as shown in Figure 4, the electronic apparatus 10 may compute the classification loss at block 418.
At block 212, the electronic apparatus 10 may add n losses Lj and the classification loss to form joint loss. For example, as shown in Figure 4, the electronic  apparatus 10 may add Loss1, Loss2 and the classification loss to form the joint loss at block 420.
At block 214, the electronic apparatus 10 may learn the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network by minimizing the joint loss with the standard back-propagation algorithm. It is noted that the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network may be learned by minimizing the classification loss with the standard back-propagation algorithm in other embodiments, and in this case, the computation of Loss1 and Loss2, and the adding operation may be omitted.
The trained neural network can then be used for classifying a sample such as image. Figure 3 is a flow chart depicting a process 300 of a testing stage of a neural network according to embodiments of the present disclosure, which may be performed at an apparatus such as the electronic apparatus 10 (for example an advanced driver assistance system (ADAS) or a self-driving apparatus) of Figure 1. As such, the electronic apparatus 10 may provide means for accomplishing various parts of the process 300 as well as means for accomplishing other processes in conjunction with other components. Moreover, the neural network has been trained by using the process 200 of Figure 2.
As described with reference to Figures 2 and 4, the neural network may comprise a factor removal neural network, a feature extraction neural network and a classification neural network. Figure 5 schematically shows a neural network 500 used for the testing stage according to an embodiment of the disclosure, wherein the neural network 500 can be used to process a sample degraded by 2 factors. As shown  in Figure 5, the neural network 500 may comprises three parts: a factor removal part 502, a feature extraction part 504 and a classification part 506. It is noted that the neural network 500 can be easily expanded to any other neural network which can process a sample degraded by more than 2 factors. The process 300 will be described in detail with reference to Figures 3 and 5.
As shown in Figure 3, the process 300 may start at block 302 where the parameters/weights of the neural network 500 (the factor removal neural network, the feature extraction neural network and the classification neural network) are initialized with the values obtained in the taining stage.
At block 304, the electronic apparatus 10 receives a sample degraded by n factors, wherein n≥2. The sample may be any suitable sample which can be processed by the neural network 500, such as image, audio or text. The sample may be pre-stored in a memory of the electronic apparatus 10, retrieved from a network location or a local location, or captured in real time for example by the ADAS/autonomous vehicle. In an embodiment, the sample is the image, and the factors may comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen. Turn to Figure 5, the sample (such as an image) shown by Input is input respectively to the factor removal  neural networks  510 and 508.
At block 306, the electronic apparatus 10 may perform the following operations for each factor i∈ [1, ..., n] of n factors: remove the factor i from the sample by a factor removal neural network; compute residual information Ri corresponding to the factor i based on the sample and the output Ci of the factor removal neural network; compute a difference Di between the output Ci and a sum of residual  information for n-1 factor (s) except the factor i; extract feature from the difference Di by the feature extraction neural network.
As shown in Figure 5,  factors  1 and 2 may be removed respectively from the sample by the factor removal  neural networks  508 and 510. In generally, the factor removal  neural networks  508 and 510 may be different neural networks each of which is suitable for removing a specific factor from the sample. The factor removal  neural networks  508 and 510 may each contain a plurality of layers. The number of layers of the factor removal  neural networks  508 and 510 may be different though the same number of layers m is shown in Figure 5. The layers of the factor removal neural networks 508 are denoted by
Figure PCTCN2017071477-appb-000026
and the layers of the factor removal neural networks 510 are denoted by
Figure PCTCN2017071477-appb-000027
stands for the estimated sample where factor 1 has been removed from the sample. 
Figure PCTCN2017071477-appb-000028
stands for the estimated sample where factor 2 has been removed from the sample.
Then residual information R1 corresponding to the factor 1 is computed by subtracting the output
Figure PCTCN2017071477-appb-000029
of the factor removal neural network 508 from the sample. Residual information R2 corresponding to the factor 2 is computed by subtracting the output
Figure PCTCN2017071477-appb-000030
of the factor removal neural network 510 from the sample.
A difference
Figure PCTCN2017071477-appb-000031
between the output
Figure PCTCN2017071477-appb-000032
of the factor removal neural network 508 and a sum of residual information for n-1 factor (s) except factor 1, wherein in this embodiment, the sum of residual information is R2. Similarly, a difference
Figure PCTCN2017071477-appb-000033
between the output
Figure PCTCN2017071477-appb-000034
of the factor removal neural network 510 and a sum of residual  information for n-1 factor (s) except factor 2, wherein in this embodiment, the sum of residual information is R1. The difference
Figure PCTCN2017071477-appb-000035
and
Figure PCTCN2017071477-appb-000036
may stand for a processed sample where both factor 1 and factor 2 are removed from the sample.
Then the difference
Figure PCTCN2017071477-appb-000037
and
Figure PCTCN2017071477-appb-000038
may be input to the feature extraction  neural network  512 and 514 to extract feature from the difference
Figure PCTCN2017071477-appb-000039
and
Figure PCTCN2017071477-appb-000040
The output
Figure PCTCN2017071477-appb-000041
of the feature extraction neural network 512 may stand for the feature extracted from the difference
Figure PCTCN2017071477-appb-000042
and the output
Figure PCTCN2017071477-appb-000043
of the feature extraction neural network 514 may stand for the feature extracted from the difference
Figure PCTCN2017071477-appb-000044
In generally, the feature extraction  neural network  512 and 514 may be the same feature extraction neural network. The feature extraction  neural network  512 and 514 may each contain a plurality of layers. The layers of the feature extraction neural network 512 are denoted by
Figure PCTCN2017071477-appb-000045
and the layers of the feature extraction neural network 514 are denoted by
Figure PCTCN2017071477-appb-000046
In other embodiment, there may be one feature extraction neural network in the neural network 500 and each difference may be sequentially input to the feature extraction neural network. In addition, the feature extraction  neural network  512 and 514 may be any suitable feature extraction neural network for example depending on the features to be extracted.
Turn to Figure 3, at block 308, the electronic apparatus 10 may stack n features to input to the classification neural network. For example, as shown in Figure 5, the feature layer
Figure PCTCN2017071477-appb-000047
and the feature layer
Figure PCTCN2017071477-appb-000048
are stacked to form the first layer E1 of the classification neural network 516. The stacking operation may further comprise convolution operation, activation operation and pooling operation. The classification  neural network 516 may comprise k layers denoted by E1, E2, …, Ek, wherein k≥3. The classification neural network 516 may be any suitable classification neural network for example depending on the features to be classified. The last layer Ek of the classification neural network 516 may be outputted as a detection/classification result at block 310.
In an embodiment, the process 300 may be used in the ADAS/autonomous vehicle, such as for object detection. For example, a vision system is equipped with the ADAS or autonomous vehicle. The process 300 can be integrated into the vision system. In the vision system, an image is captured by a camera and the important objects such as pedestrians and bicycles are detected from the image by the process 300. In the ADAS, some forms (e.g., warning voice) of warning may be generated if important objects (e.g., pedestrians) are detected so that the driver in the vehicle can pay attention to the objects and try to avoid traffic accident. In the Autonomous Vehicle, the detected objects may be used as inputs of a control module and the control module takes proper action according to the objects.
Some advantages of the method of the embodiments of the disclosure are as follows. (1) The method constructs a neural network (such as a deep convolutional neural network) which greatly improves the performance of object detection systems, wherein a sample which is input to the object detection systems is degraded by at least two factors. (2) The restoration residual such as R1 in Figure 4 corresponding to one factor is used to deal with the sample degraded by another factor, which greatly weaken the negative influence of the factors of the sample. (3) The adverse factor removal, feature extraction, and classification are jointly performed under the framework of a neural network such as the deep convolutional neural network.
According to an aspect of the disclosure it is provided an apparatus for object detection. For same parts as in the previous embodiments, the description thereof may be omitted as appropriate. The apparatus may comprise means configured to carry out the processes described above. In an embodiment, the apparatus comprises means configured to receive a sample degraded by at least two factors; means configured to perform the following operations for each factor of the at least two factors: remove a factor of the at least two factors from the sample by a factor removal neural network; compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by a feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to a classification neural network; and means configured to output the result of classification neural network as a detection result.
In an embodiment, the apparatus further comprises means configured to train the factor removal neural network, the feature extraction neural network and the classification neural network.
In an embodiment, the apparatus further comprises means configured to receive a set of pairs of training samples with labels, wherein each pair of training samples contains first training sample degraded by the at least two factors and second training sample where a factor of the at least two factors does not degrade the first training sample; means configured to perform the following operations for the factor of each pair of training samples: remove the factor from the first training sample by the factor removal neural network; compute residual information corresponding to the factor based on the first training sample and the output of the factor removal neural  network; compute a loss of the factor removal neural network based on the difference between the output and the second training sample; compute a difference between the output and a sum of residual information for all the other factor (s) except the factor; extract a feature from the difference by the feature extraction neural network; means configured to stack the feature extracted by each feature extraction neural network to input to the classification neural network; means configured to compute a classification loss based on the result of classification neural network and the label of the first training sample; means configured to add the loss of each factor removal neural network and the classification loss to form joint loss; and means configured to learn the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network by minimizing the joint loss with the standard back-propagation algorithm.
In an embodiment, the sample and the training sample comprise one of image, audio and text.
In an embodiment, the sample and the training sample is the image, and the factors comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
In an embodiment, the apparatus is used in an advanced driver assistance system/autonomous vehicle.
In an embodiment, the neural network comprises a convolutional neural network.
It is noted that any of the components of the apparatus described above can be implemented as hardware or software modules. In the case of software modules, they  can be embodied on a tangible computer-readable recordable storage medium. All of the software modules (or any subset thereof) can be on the same medium, or each can be on a different medium, for example. The software modules can run, for example, on a hardware processor. The method steps can then be carried out using the distinct software modules, as described above, executing on a hardware processor.
Additionally, an aspect of the disclosure can make use of software running on a general purpose computer or workstation. Such an implementation might employ, for example, a processor, a memory, and an input/output interface formed, for example, by a display and a keyboard. The term “processor” as used herein is intended to include any processing device, such as, for example, one that includes a CPU (central processing unit) and/or other forms of processing circuitry. Further, the term “processor” may refer to more than one individual processor. The term “memory” is intended to include memory associated with a processor or CPU, such as, for example, RAM (random access memory) , ROM (read only memory) , a fixed memory device (for example, hard drive) , a removable memory device (for example, diskette) , a flash memory and the like. The processor, memory, and input/output interface such as display and keyboard can be interconnected, for example, via bus as part of a data processing unit. Suitable interconnections, for example via bus, can also be provided to a network interface, such as a network card, which can be provided to interface with a computer network, and to a media interface, such as a diskette or CD-ROM drive, which can be provided to interface with media.
Accordingly, computer software including instructions or code for performing the methodologies of the disclosure, as described herein, may be stored in associated memory devices (for example, ROM, fixed or removable memory) and, when ready to be utilized, loaded in part or in whole (for example, into RAM) and implemented  by a CPU. Such software could include, but is not limited to, firmware, resident software, microcode, and the like.
As noted, aspects of the disclosure may take the form of a computer program product embodied in a computer readable medium having computer readable program code embodied thereon. Also, any combination of computer readable media may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (anon-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations for aspects of the disclosure may be written in any combination of at least one programming language, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the "C"programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone  software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, component, segment, or portion of code, which comprises at least one executable instruction for implementing the specified logical function (s) . It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It should be noted that the terms "connected, " "coupled, " or any variant thereof, mean any connection or coupling, either direct or indirect, between two or more elements, and may encompass the presence of one or more intermediate elements between two elements that are "connected" or "coupled" together. The coupling or connection between the elements can be physical, logical, or a combination thereof. As employed herein, two elements may be considered to be "connected" or "coupled" together by the use of one or more wires, cables and/or printed electrical connections, as well as by the use of electromagnetic energy, such as  electromagnetic energy having wavelengths in the radio frequency region, the microwave region and the optical region (both visible and invisible) , as several non-limiting and non-exhaustive examples.
In any case, it should be understood that the components illustrated in this disclosure may be implemented in various forms of hardware, software, or combinations thereof, for example, application specific integrated circuit (s) (ASICS) , a functional circuitry, a graphics processing unit, an appropriately programmed general purpose digital computer with associated memory, and the like. Given the teachings of the disclosure provided herein, one of ordinary skill in the related art will be able to contemplate other implementations of the components of the disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a, ” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising, ” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of another feature, integer, step, operation, element, component, and/or group thereof.
The descriptions of the various embodiments have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments.

Claims (16)

  1. An apparatus, comprising:
    at least one processor;
    at least one memory including computer program code, the memory and the computer program code configured to, working with the at least one processor, cause the apparatus to
    receive a sample degraded by at least two factors;
    perform the following operations for each factor of the at least two factors:
    remove a factor of the at least two factors from the sample by a factor removal neural network;
    compute residual information corresponding to the factor based on the sample and the output of the factor removal neural network;
    compute a difference between the output and a sum of residual information for all the other factor (s) except the factor;
    extract a feature from the difference by a feature extraction neural network;
    stack the feature extracted by each feature extraction neural network to input to a classification neural network; and
    output the result of classification neural network as a detection result.
  2. The apparatus according to claim 1, wherein the memory and the computer program code is further configured to, working with the at least one processor, cause the apparatus to
    train the factor removal neural network, the feature extraction neural network and the classification neural network.
  3. The apparatus according to claim 2, wherein the memory and the computer program code is further configured to, working with the at least one processor, cause the apparatus to
    receive a set of pairs of training samples with labels, wherein each pair of training samples contains first training sample degraded by the at least two factors and second training sample where a factor of the at least two factors does not degrade the first training sample;
    perform the following operations for the factor of each pair of training samples:
    remove the factor from the first training sample by the factor removal neural network;
    compute residual information corresponding to the factor based on the first training sample and the output of the factor removal neural network;
    compute a loss of the factor removal neural network based on the difference between the output and the second training sample;
    compute a difference between the output and a sum of residual information for all the other factor (s) except the factor;
    extract a feature from the difference by the feature extraction neural network;
    stack the feature extracted by each feature extraction neural network to input to the classification neural network;
    compute a classification loss based on the result of classification neural network and the label of the first training sample;
    add the loss of each factor removal neural network and the classification loss to form joint loss; and
    learn the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network by minimizing the joint loss with the standard back-propagation algorithm.
  4. The apparatus according to any one claims 1-3, wherein the sample and the training sample comprise one of image, audio and text.
  5. The apparatus according to any one of claims 1-4, wherein the sample and the training sample is the image, and the factors comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
  6. The apparatus according to claim any one of claims 1-5, wherein the apparatus is used in an advanced driver assistance system/autonomous vehicle.
  7. The apparatus according to claim any one of claims 1-6, the neural network comprises a convolutional neural network.
  8. A method comprising:
    receiving a sample degraded by at least two factors;
    performing the following operations for each factor of the at least two factors:
    removing a factor of the at least two factors from the sample by a factor removal neural network;
    computing residual information corresponding to the factor based on the sample and the output of the factor removal neural network;
    computing a difference between the output and a sum of residual information for all the other factor (s) except the factor;
    extracting a feature from the difference by a feature extraction neural network;
    stacking the feature extracted by each feature extraction neural network to input to a classification neural network; and
    outputting the result of classification neural network as a detection result.
  9. The method according to claim 8, further comprising
    training the factor removal neural network, the feature extraction neural network and the classification neural network.
  10. The method according to claim 9, wherein the training comprises
    receiving a set of pairs of training samples with labels, wherein each pair of training samples contains first training sample degraded by the at least two factors and second training sample where a factor of the at least two factors does not degrade the first training sample;
    performing the following operations for the factor of each pair of training samples:
    removing the factor from the first training sample by the factor removal neural network;
    compute residual information corresponding to the factor based on the first training sample and the output of the factor removal neural network;
    computing a loss of the factor removal neural network based on the difference between the output and the second training sample;
    computing a difference between the output and a sum of residual information for all the other factor (s) except the factor;
    extracting a feature from the difference by the feature extraction neural network;
    stacking the feature extracted by each feature extraction neural network to input to the classification neural network;
    computing a classification loss based on the result of classification neural network and the label of the first training sample;
    adding the loss of each factor removal neural network and the classification loss to form joint loss; and
    learning the parameters of the factor removal neural network, the feature extraction neural network and the classification neural network by minimizing the joint loss with the standard back-propagation algorithm.
  11. The method according to any one claims 8-10, wherein the sample and the training sample comprise one of image, audio and text.
  12. The method according to any one of claims 8-11, wherein the sample and the training sample is the image, and the factors comprise at least two of haze, fog, dark light, dust storm, sand storm, snow, hailstone, blowball, pollen.
  13. The method according to claim any one of claims 8-12, wherein the method is used in an advanced driver assistance system/autonomous vehicle.
  14. An apparatus, comprising means configured to carry out the method according to any one of claims 8 to 13.
  15. A computer program product embodied on a distribution medium readable by a computer and comprising program instructions which, when loaded into a computer, execute the method according to any one of claims 8 to 13.
  16. A non-transitory computer readable medium having encoded thereon statements and instructions to cause a processor to execute a method according to any one of claims 8 to 13.
PCT/CN2017/071477 2017-01-18 2017-01-18 Apparatus, method and computer program product for object detection Ceased WO2018132961A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/071477 WO2018132961A1 (en) 2017-01-18 2017-01-18 Apparatus, method and computer program product for object detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/071477 WO2018132961A1 (en) 2017-01-18 2017-01-18 Apparatus, method and computer program product for object detection

Publications (1)

Publication Number Publication Date
WO2018132961A1 true WO2018132961A1 (en) 2018-07-26

Family

ID=62907723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/071477 Ceased WO2018132961A1 (en) 2017-01-18 2017-01-18 Apparatus, method and computer program product for object detection

Country Status (1)

Country Link
WO (1) WO2018132961A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697464A (en) * 2018-12-17 2019-04-30 环球智达科技(北京)有限公司 Method and system based on the identification of the precision target of object detection and signature search
CN110570371A (en) * 2019-08-28 2019-12-13 天津大学 An image defogging method based on multi-scale residual learning
CN112132169A (en) * 2019-06-25 2020-12-25 富士通株式会社 Information processing apparatus and information processing method
CN112184590A (en) * 2020-09-30 2021-01-05 西安理工大学 A Single Dust Image Restoration Method Based on Grayscale World Self-Guided Network
CN114283350A (en) * 2021-09-17 2022-04-05 腾讯科技(深圳)有限公司 Visual model training and video processing method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914813A (en) * 2014-04-10 2014-07-09 西安电子科技大学 Colorful haze image defogging and illumination compensation restoration method
US20160078605A1 (en) * 2014-09-16 2016-03-17 National Taipei University Of Technology Image restoration method and image processing apparatus using the same
CN105844257A (en) * 2016-04-11 2016-08-10 吉林大学 Early warning system based on machine vision driving-in-fog road denoter missing and early warning method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103914813A (en) * 2014-04-10 2014-07-09 西安电子科技大学 Colorful haze image defogging and illumination compensation restoration method
US20160078605A1 (en) * 2014-09-16 2016-03-17 National Taipei University Of Technology Image restoration method and image processing apparatus using the same
CN105844257A (en) * 2016-04-11 2016-08-10 吉林大学 Early warning system based on machine vision driving-in-fog road denoter missing and early warning method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HE KAIMING ET AL.: "Deep Residual Learning for Image Recognition", 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 31 December 2016 (2016-12-31), XP055353100 *
TIAN YU ET AL.: "Adaptive Optics Images Restoration Based on Frame Selection and Multi- Frame Blind Deconvolution", ACTA ASTRONOMICA SINICA, vol. 49, no. 4, 31 October 2008 (2008-10-31), XP026091076, ISSN: 0001-5245, Retrieved from the Internet <URL:https://doi.org/10.1016/j.chinastron.2009.03.004> *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697464A (en) * 2018-12-17 2019-04-30 环球智达科技(北京)有限公司 Method and system based on the identification of the precision target of object detection and signature search
CN112132169A (en) * 2019-06-25 2020-12-25 富士通株式会社 Information processing apparatus and information processing method
CN112132169B (en) * 2019-06-25 2023-08-04 富士通株式会社 Information processing apparatus and information processing method
CN110570371A (en) * 2019-08-28 2019-12-13 天津大学 An image defogging method based on multi-scale residual learning
CN110570371B (en) * 2019-08-28 2023-08-29 天津大学 Image defogging method based on multi-scale residual error learning
CN112184590A (en) * 2020-09-30 2021-01-05 西安理工大学 A Single Dust Image Restoration Method Based on Grayscale World Self-Guided Network
CN112184590B (en) * 2020-09-30 2024-03-26 西安理工大学 Single dust image recovery method based on gray world self-guiding network
CN114283350A (en) * 2021-09-17 2022-04-05 腾讯科技(深圳)有限公司 Visual model training and video processing method, device, equipment and storage medium
CN114283350B (en) * 2021-09-17 2024-06-07 腾讯科技(深圳)有限公司 Visual model training and video processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109635621B (en) System and method for recognizing gestures based on deep learning in first-person perspective
WO2019222951A1 (en) Method and apparatus for computer vision
US11443438B2 (en) Network module and distribution method and apparatus, electronic device, and storage medium
US8823798B2 (en) Obscuring identification information in an image of a vehicle
CN108229324B (en) Gesture tracking method and device, electronic equipment and computer storage medium
WO2018132961A1 (en) Apparatus, method and computer program product for object detection
US20200380263A1 (en) Detecting key frames in video compression in an artificial intelligence semiconductor solution
US11386287B2 (en) Method and apparatus for computer vision
WO2017074786A1 (en) System and method for automatic detection of spherical video content
WO2022166625A1 (en) Method for information pushing in vehicle travel scenario, and related apparatus
WO2018002436A1 (en) Method and apparatus for removing turbid objects in an image
WO2017197593A1 (en) Apparatus, method and computer program product for recovering editable slide
CN112396060B (en) Identification card recognition method based on identification card segmentation model and related equipment thereof
CN112287945A (en) Screen fragmentation determination method and device, computer equipment and computer readable storage medium
CN107886110A (en) Method for detecting human face, device and electronic equipment
CN110121719A (en) Device, method and computer program product for deep learning
CN107516295A (en) Method and device for removing noise in an image
CN108304840B (en) Image data processing method and device
CN116434173B (en) Road image detection method, device, electronic device and storage medium
CN118451417A (en) Traffic robbery identification method, device and system based on large model and storage medium
CN117557930A (en) Fire disaster identification method and fire disaster identification device based on aerial image
CN117113231A (en) Multi-modal dangerous environment perception and early warning method for people with bowed heads based on mobile terminals
CN113627243B (en) A text recognition method and related device
CN113628148A (en) Infrared image noise reduction method and device
CN119027908B (en) Multi-scale traffic signal lamp detection and identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17893456

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17893456

Country of ref document: EP

Kind code of ref document: A1