As an integral part of a security plan, surveillance systems are utilized for crime prevention, detection, and prosecution purposes. Common locations where surveillance systems are utilized include banks, airports, casinos, parking lots and garages, parks, corporate facilities and the like, though increased availability has brought about its deployment in more private settings such as personal residences. Surveillance systems typically employ closed circuit television (CCTV) cameras that are connected to a central observation station with one or more video monitors. Thus, multiple locations may be monitored simultaneously from a single location by a minimal number of operators. By design, CCTV systems are intended for security personnel to constantly observe the monitors and respond to any incidents in real-time. CCTV cameras may also be connected to video recording devices that archive footage for subsequent viewing, analysis, and other related uses.
Earlier analog CCTV systems were deficient in a number of different respects. The distance between each individual camera and the central monitoring station was limited because of transmission distance restrictions associated with coaxial cables. These coaxial cables were bulky and therefore challenging to route, and were also costly to acquire and maintain.
In order for video footage to be archived, it was necessary to record to magnetic storage devices such as Video Home System (VHS) tapes with Videocassette Recorders (VCRs). Because videotape is a sequential access device, random access to relevant footage is challenging at best. Furthermore, archiving footage for potential future needs was problematic. In Standard Play (SP) mode, depending in the length of tape in the cassette, anywhere between 2 hours to 3 hours of footage could be recorded. In other play modes such as Extended Play (EP) and Long Play (LP), up to 8 hours of footage or 12 hours of footage, respectively, could be recorded. As such, numerous videocassette tapes were necessary along with an appropriate system to manage the rotation of tapes. The number of videotapes increased exponentially where many video cameras were deployed, as each camera typically required its own VCR and tape. Although multiple cameras could be aggregated into a single monitor and a single VCR/tape, large CCTV installations were nevertheless deficient.
With developments in and widespread deployment of networking technology, CCTV surveillance systems are increasingly being replaced with Internet Protocol (IP) network camera systems. Like analog CCTV systems, cameras are installed in multiple locations, with the footage being viewable from the central monitoring station. The cameras have digital sensors, however, in which photons of light from each image or frame of video are converted to data representative of the same. This data is transmitted over a conventional data transfer link such as Ethernet. A minimalist video server may be incorporated into each of the cameras, and a remote client software application may communicate with each of the video servers to request video data for display. Because the networking protocols for IP network camera systems are the same as those utilized in standard computer networks, surveillance systems were able to use existing network infrastructure. Unlike analog CCTV systems, IP network camera systems do not have distance limitations, and because a large volume of data can be stored with relative ease on hard disk drives, optical discs, and other such media, the burdens previously associated with access and management of surveillance footage is greatly reduced.
With ever-growing data storage capacities, various applications and needs that exploit those improvements continually arise. For example, owners of department stores and malls generally have a duty of care to customers to eliminate any potential hazards on the premises, and to aid anyone that may have become injured. Due to the prevailing litigious atmosphere, it is likely that the owner of the premises may be sued weeks, or even months after the injury.
Without conclusive video footage of accidents and the responses thereto, a victim-plaintiff prepared with convincing medical reports and professional expert witnesses may quickly gain the upper hand in litigation and settlement negotiations. Accordingly, it is common for surveillance footage in such commercial/retail environments to be retained for as long as two months, and even longer.
Mode for Invention
The detailed description set forth below in connection with the appended drawings is intended as a description of the presently preferred embodiment of the invention, and is not intended to represent the only form in which the present invention may be constructed or utilized. The description sets forth the functions of the invention in connection with the illustrated embodiment. It is to be understood, however, that the same or equivalent functions and may be accomplished by different embodiments that are also intended to be encompassed within the scope of the invention. It is further understood that the use of relational terms such as first and second, top and bottom, and the like are used solely to distinguish one from another entity or step without necessarily requiring or implying any actual such relationship or order between such entities or steps.
The illustration of FIG. 1 shows an exemplary environment 10 in which a surveillance system of the present invention may be installed and deployed. By way of example, the environment 10 includes a building 12 that is located on a street corner. There is an entrance 14 that faces one of the streets, and a gated side entrance 16 on another one of the streets. Along the first street, there is an entryway into a parking lot 18. As part of its security arrangements, these various locations are monitored by surveillance camera units 20. In further detail, the parking lot 18 is monitored by a first camera unit 20a in order to monitor, for example, potential carjackings, assault, robberies, vehicle vandalism, and so forth. The entrance 14 is monitored by a pair of second and third camera units 20b, 20c, respectively, to track the people entering and exiting the building 12. The side entrance 16 is monitored by a fourth camera unit 20d to track vehicles and their license plates approaching the gate for access.
Notwithstanding the foregoing installation specificities, it will be appreciated by those having ordinary skill in the art that the surveillance camera units 20 may be installed in different places in the environment 10 for a variety of purposes. As utilized herein, the term location is understood to refer generally to the environment 10 that is being monitored with the surveillance system of the present invention. Furthermore, the term location is also understood to refer to specific segments of the environment 10 that is being monitored by a specific one of the surveillance camera units 20. Along these lines, for purposes of explaining the various features of the surveillance system, different surveillance camera units 20 may also be referred to by the particular installation location, e.g., the first surveillance camera unit 20a that covers the parking lot 18 may also be referred to as the parking lot camera unit.
With reference to the block diagram of FIG. 2, further details of the surveillance system 1 as contemplated by various embodiments of the present invention will be described. As noted above, the surveillance system 1 includes one or more surveillance camera units 20 that are positioned throughout various locations in the environment 10 to record video footage therefrom. The parking lot camera unit 20a, the left front door camera unit 20b, the right front door camera unit 20b, the left front door camera unit 20c, and the side entrance camera unit 20d are connected to an internal network 22.
The surveillance footage recorded by the surveillance camera units 20 may be transmitted to and displayed on a remote monitoring station 24 that is manned by security personnel. The remote monitoring station 24 is likewise connected to the internal network 22.
The actual location of the remote monitoring station 24 need not be limited to within the building 12. Monitoring services performed by the security personnel may be outsourced to third party providers otherwise unaffiliated with the management of the building 12, and may also be remotely located from the same.
As will be described in further detail below, the internal network 22 is contemplated to be a Transmission Control Protocol/Internet Protocol (TCP/IP) network that can be interconnected to other systems on the Internet. Data traffic from each of the surveillance camera units 20 may be aggregated by a hub or switch 26 that similarly complies with the TCP/IP standards of the internal network 22. The use of the TCP/IP network in this context is by way of example only, as many existing networked camera devices are compliant therewith. Other networking standards may, however, be substituted without departing from the present invention.
The remote monitoring station 24 may be a conventional desktop computer having a central processing unit, memory, and input and output devices connected thereto such as keyboards, mice, and display units 28. The remote monitoring station 24 is understood to have software instructions loaded thereon that, when executed, perform various functions involved with accessing and displaying footage from the surveillance camera units 20. As will be described in further detail below, the surveillance camera unit 20 functions as a server, as the term is understood in relation to the TCP/IP internal network 22. The remote monitoring station 24 thus functions as a client requesting data from the server. With a communications link established between the surveillance camera unit 20 and the remote monitoring station 24, however, upon the automated detection of certain events, the surveillance camera unit 20 may notify the remote monitoring station.
In one contemplated variation, the remote monitoring station 24 includes a web browser application such as Internet Explorer from Microsoft Corporation of Redmond, Washington, or Firefox from the Mozilla Foundation. The surveillance camera units 20 are understood to have basic versions of a HyperText Transfer Protocol (HTTP) server and a video streaming server. Via plug-in modules supplementing the functionality of the web browser application with media playback features, data from the video streaming server is processed and displayed on the remote monitoring station.
In another contemplated variation, the remote monitoring station 24 is loaded with a dedicated video feed display application such as Maximum® from Axium Technologies, Inc. of Irwindale, California. The video streaming servers of the surveillance camera units 20 communicate directly with such a display application to deliver the recorded surveillance footage.
As best illustrated in FIG. 2, the display 28 may be segregated into four subsections 28a-d, corresponding to each of the surveillance camera units 20a-d in the system 1. A variety of other layouts that conveniently show the different feeds are also envisioned, along with the interactive features that may direct the operation of the surveillance camera units 20.
In further detail shown in FIG. 3, the surveillance camera unit 20 in accordance with one embodiment of the present invention includes an audio module 30, a video module 32, and a central processor 34. The audio module 30 is connected to an acoustic transducer 36 or microphone, which generates an analog electrical signal of the sound from the monitored location. The analog electrical is then converted to a digital representation by an analog-to-digital converter (ADC) 38. In some embodiments, the ADC 38 may be incorporated into the audio module 30. Alternatively, the ADC 38 may be a separate, standalone component as shown in the block diagram of FIG. 3. The video module 32 is connected to a video camera 40, which in its most basic form includes a sensor that converts photons of light striking it into a representative
video signal. The photons of light are understood to be reflections from the pertinent scene of the monitored location. Any suitable video camera having various lenses, adjustable apertures, and sensor types and resolutions may be utilized.
Referring to the flowchart of FIG. 4, the present invention also contemplates a method of surveillance. The method begins with a step 200 of receiving an input audio signal of the specific location in the environment 10. There is also a subsequent step 202 of receiving an input video signal of the specific location in the environment 10. It is understood that the step 200 of receiving the input audio signal and the step 202 of receiving the input video stream may occur simultaneously, as the operation of the microphone 36 and the audio module 30 are not exclusive of the operation of the video camera 40 and the video module 32. After the input audio signal and the input video stream are received, the method continues with a step 204 of detecting a triggering event based upon such signals. The processing of the audio signal and of the video stream will be described in turn, below.
The flowchart of FIG. 5 best illustrates the further detailed steps involved with receiving the audio signal and detecting the triggering event. As noted previously, the analog audio signal is acquired from the monitored environment 10 by the acoustic transducer 36, indicated as step 300. This step is understood to correspond generally to step 200 above. Then, in step 302, the ADC 38 converts the analog signal to a digital representation. The converted digital representation is then fed to the audio module 30 in accordance with step 304, and is analyzed to determine whether the recorded sound signal matches any predefined sonic signature in step 306.
If there is determined to be a match in decision branch 308, then an appropriate event signal indicating the same is generated to the central processor 34 according to step 310. Otherwise, the camera system 20 continues to monitor the environment. It is understood that the foregoing operations on the analog audio signal are continuously performed on a real-time basis.
In further detail, the predefined sonic signature is understood to be a reference sample of a sound associated with the triggering event. A variety of triggering events are contemplated, including vehicle collisions, firearm discharge, graffiti vandalism, assault, robbery, burglary and the like. By way of example, vehicle collisions may have a number of corresponding sounds such as breaking glass, skidding rubber, and crumpling sheet metal. A firearm discharge may include a sound of the explosion of gunpowder and the crack of the bullet reaching supersonic speeds. Graffiti vandalism may have a spray paint can discharge sound, as well as a sound associated with the agitator rolling around within the can. Usually, depending on the perpetrator's ability to silence the victim, assault and robbery victims scream for help. Like vehicle collisions, burglary or other crimes necessitating the destruction of entry barriers are typically accompanied by sounds of breaking glass and similar impacts.
Each of the foregoing sounds has a particular characteristic that may be compared to the input audio signals. So that the surveillance system 1 is able to respond to a variety of situations, the predefined sonic signature for each of aforementioned trigger events may be stored for access by the audio module 30. Referring again to the block diagram of FIG. 3, the surveillance camera unit 20 may include a memory module 42 for this purpose. Because the predefined sonic signatures may be stored for subsequent retrieval even when the surveillance camera unit 20 is powered off, the use of a non-volatile memory device such as Flash is envisioned. Further, because updates of the predefined sonic signatures can be provided, the memory module 42 may be removable, such as a Secure Digital (SD) card.
The input audio signals are from a live environment, so there are many superfluous sounds that may be mixed in with the sounds of interest. In further detail, the input audio signal is understood to have a triggering event component that is the sound of interest, and a background noise component that, to increase accuracy, must be minimized. Thus, the level of the background noise component is normalized to that of earlier recorded background noise components that, in hindsight, did not include the triggering event component. A large sampling of the earlier recorded background noise components may be utilized to build an accurate representation of the noise characteristics for the particular location being monitored in the environment 10. Because different points throughout the day, different days of the week, and different months may have different noise characteristics, each such time division may have its own noise normalization levels. This profiling of noise is therefore understood to be intelligent and self-educational.
As indicated above, the audio module 30 compares the input audio signal and the predefined sound signature to determine whether the triggering event occurred. With the input audio signal being in digital form, various digital signal processing (DSP) algorithms may be utilized to determine the degree of similarity. Along these lines, the audio module 30 may be a dedicated DSP microprocessor such as the DaVinci® line of devices, including the DM6446 integrated circuit, from Texas Instruments of Dallas, Texas. It is understood that these DSP devices have architectures that are specially designed for signal processing applications, such as fast multiply-accumulate (MAC) operations, single instruction multiple data (SIMD) operations, and so forth. Those having ordinary skill in the art will be able to readily ascertain an appropriate substitute device.
Turning now to the video analytics feature of the surveillance camera unit 20, as noted above, the method of surveillance according to one embodiment of the present invention includes the step 204 of detecting the triggering event. The footage captured by the video camera 40 is evaluated by the video module 30 for particular events that may be unfolding. The evaluation procedure is understood to be built upon several basic image processing algorithms that involve an analysis of a sequence of image frames of the input video stream, so a DSP device may be utilized. It is possible to utilize two independent devices for the audio module 30 and the video module 32, but it is also contemplated that a single device may perform both functions. The exact circuit implementation is not intended to be limiting.
According to one embodiment, the video module 30 may utilize the OnBoard™ Application Programming Interface (API) from ObjectVideo, Inc. of Reston, VA, though any other video analytics libraries may be substituted. From the footage, the video module 32 is capable of differentiating between different objects that may appear, including people, vehicles, and other items such as luggage. When such objects cross over a predefined boundary, a tripwire event notification may be generated. Further sophisticated analyses are possible with a second predefined boundary, and various rules relative to the first boundary may be defined.
Additionally, when objects enter or exit an area of interest, another event notification may be generated. Similar to the enter or exit event, the video module 32 is capable of detecting when an object appears or disappears from an area of interest without first appearing or subsequently disappearing, respectively, from the periphery. When objects are taken away or left behind, another event notification may be generated. In order to reduce the possibility of false positives, the video module 32 may include the ability to filter out objects that are too small or too large, objects that change size or shape too rapidly.
Again, like the audio analytics, a particular sequence of events detected from the surveillance footage by the video module 32 are understood to be representative of a specific triggering event such as a vehicle crash, a theft, an assault, a robbery, and the like. For example, if a vehicle is observed crossing a second tripwire before a first tripwire when normal flow of traffic should be the opposite, then it can be determined that the vehicle is travelling in the wrong direction. As another example, when an objet is left behind, there is a possibility that it could contain dangerous explosives with the potential to cause serious harm, whereas when an object normally within the area of interest suddenly disappears, it may have been stolen. Prior to committing a crime, people tend to loiter in a location to conduct reconnaissance and/or to pick a suitable victim, though it is just as likely for people to loiter when waiting for someone to arrive.
When the triggering event is detected by either the audio module 30 or the video module 32, an notification to that effect is provided to the central processor 34. Various embodiments of the present invention contemplate different ways to proceed based upon the sequence in which the notification from the audio module 30 and the video module 32 are received. It is envisioned that such functionality reduces the need for constant human monitoring, and only when potential events are identified, is human monitoring and action necessary.
FIG. 6 is a flowchart illustrating one possible execution flow. Beginning with step 400a, which generally corresponds to step 200 above, the input audio signal is received. Further, step 400b, which generally corresponds to step 202 above, the input video stream is received. In steps 410a and 410b, a triggering event is detected by the respective one of the audio module 30 and the video module 32. These two steps are understood to correspond to step 204 above. Here, the triggering events detected by the audio module 30 and the video module 32 may be based off the same occurrence in the monitored location, and if so, the two will generate its respective event notifications to the central processor 34 at the same time. If the event notifications are not based off the same occurrence, the notifications will generally be received at different times. This evaluation is made in decision branch 420.
The method of surveillance shown in the flowchart of FIG. 4 continues with a step 206 of generating an alarm in response to the received event notifications. Along these lines, and referring back to the flowchart of FIG. 6, only if the notifications are concurrently received, does the central processor 34 generates an alarm signal according to corresponding step 430.
Optionally, prior to generating the alarm signal, specific location information in the form of Global Positioning Satellite (GPS) coordinates may be generated in step 428. As best shown in FIG. 3, the surveillance camera unit 20 includes a GPS receiver 44 that is connected to the central processor 34. The acquisition of GPS coordinates is well known in the art, and a further description of the same will be omitted. Once generated, the GPS coordinates are incorporated into the alarm signal according to step 429.
With or without the GPS coordinates, the central processor 34 then transmits the alarm signal to the remote monitoring station 24 per step 432. The surveillance camera unit 20 includes a network communications module 46 that establishes a data transfer link to the remote monitoring station 24 over the internal network 22. As noted above, the internal network 22 is a TCP/IP network, with the physical cabling being Ethernet. Therefore, the network communications module 46 is understood to include ports to which Ethernet cables can be connected. Alternative network communications modalities such as WiFi may also be utilized, however, and in which case the network communications module 46 would include a wireless transceiver.
Following the transmission of the alarm signal, the recorded input video stream and the input audio signal may also be transmitted to the remote monitoring station 24 in accordance with step 434. Similar to the transmitted alarm signal, this data is transmitted by the network communications module 46. Together with the transmission of the alarm signal in step 432, step 434 generally corresponds to a step 208 of transmitting the input audio signal, the input video signal, and the alarm signal to the remote monitoring station 24 as shown in the flowchart of FIG. 4.
In conjunction with transmitting the input audio signal and the input video stream to the remote monitoring station 24, the data can be stored in the memory module 42 for backup purposes according to step 436. As noted above, the memory module 42 may be a portable device that can be removed from the surveillance camera unit 20.
Additionally, though also optionally, devices attached to a peripheral port 48 may be activated in step 438 after the alarm signal is generated, that is, when the triggering event is detected by both the audio module 30 and the video module 32. Exemplary devices that may be connected to the peripheral port 48 include floodlights or strobe lights, as well as alarm sound generators. Such devices may provide a startling effect to a perpetrator, and direct the attention of nearby security personnel. It will be appreciated that any other suitable device may be so triggered by the central processor 34.
If, on the other hand, the event notifications are not concurrently received as determined in the decision branch, then the central processor 34 simply generates an event signal per step 440, and transmits the same in step 442. The notification may be recorded by the remote monitoring station 24 that there was a possible detection of a triggering event from either an input audio signal or an input video stream, and the display 28 may indicate as such.
The particulars shown herein are by way of example and for purposes of illustrative discussion of the embodiments of the present invention only and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the present invention. In this regard, no attempt is made to show structural details of the present invention in more detail than is necessary for the fundamental understanding of the present invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the present invention may be embodied in practice.