JP6575365B2

JP6575365B2 - Sound source detection apparatus, sound source detection method, and program

Info

Publication number: JP6575365B2
Application number: JP2016003669A
Authority: JP
Inventors: 智佳子松本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-01-12
Filing date: 2016-01-12
Publication date: 2019-09-18
Anticipated expiration: 2036-01-12
Also published as: JP2017125893A

Description

本発明は、音源検出装置、音源検出方法、及びプログラムに関する。 The present invention relates to a sound source detection device, a sound source detection method, and a program.

ある位置の周囲に存在する音源の検出方法として、複数のマイクロフォン（以下「マイク」ともいう）で収音した音声信号から周波数成分毎の位相差を算出し、これら位相差に基づいて音源の方向を検出する方法が知られている。 As a method of detecting a sound source existing around a certain position, a phase difference for each frequency component is calculated from sound signals picked up by a plurality of microphones (hereinafter also referred to as “microphones”), and the direction of the sound source is calculated based on these phase differences. There are known methods for detecting.

この種の検出方法に関し、直線ハフ変換等を用いて、各周波数成分の位相差から、同一音源に由来する周波数と位相差との間の比例関係を反映した図形を検出することにより、複数の音源を検出する方法が知られている（例えば、特許文献１を参照）。 With respect to this type of detection method, by using a linear Hough transform or the like, by detecting a graphic reflecting the proportional relationship between the frequency and the phase difference derived from the same sound source from the phase difference of each frequency component, a plurality of A method for detecting a sound source is known (see, for example, Patent Document 1).

また、複数のマイクのうちの基準となるマイクで観測した観測音の振幅を大きさとする基準音ベクトルを設定し、当該基準音ベクトルを表す複数の音ベクトルの和に基づいて、複数の音源を検出する方法が知られている（例えば、特許文献２を参照）。 In addition, a reference sound vector whose magnitude is the amplitude of the observation sound observed by the reference microphone among the plurality of microphones is set, and a plurality of sound sources are set based on the sum of the plurality of sound vectors representing the reference sound vector. A detection method is known (see, for example, Patent Document 2).

更に、関連する技術として、周波数成分毎の位相差に基づいて特定した音源方向の頻度と、車両特定情報に含まれる閾値とに基づいて、自車両の進行方向に存在する複数の他車両の方向を特定する技術が知られている（例えば、特許文献３を参照）。この種の技術では、車両特定情報に含まれる第１の閾値と音源方向の頻度とに基づいて他車両の方向を特定した後、特定した他車両の方向とは異なる方向の頻度と、第１の閾値よりも小さい第２の閾値とに基づいて別の他車両の方向を特定する。 Further, as a related technique, directions of a plurality of other vehicles existing in the traveling direction of the own vehicle based on the frequency of the sound source direction specified based on the phase difference for each frequency component and the threshold value included in the vehicle specifying information. Is known (see, for example, Patent Document 3). In this type of technology, after specifying the direction of the other vehicle based on the first threshold value included in the vehicle specifying information and the frequency of the sound source direction, the frequency of the direction different from the direction of the specified other vehicle, The direction of another other vehicle is specified based on the second threshold value smaller than the second threshold value.

特開２００６−２５４２２６号公報JP 2006-254226 A 特開２００７−０９６４１８号公報JP 2007-096418 A 国際公開第２０１２／０９８８３６号International Publication No. 2012/098836

上記した音源の検出方法は、マイクから見た複数の音源の方向が異なる場合には複数の音源を検出することが可能であるものの、マイクから見た複数の音源の方向が略同一である場合に複数の音源を検出することは非常に困難である。すなわち、上記の検出方法では、ある方向からマイクに到来した音が１個の音源からの音であるのか、複数の音源からの音が重なった音であるのかを判別することが非常に難しい。そのため、上記の検出方法では、第１の音源を検出している期間にマイクから見て第１の音源と略同一の方向に第２の音源が発生したことや、第１の音源と略同一の方向に存在する第２の音源が音を発しなくなったことを検出することが非常に難しい。 The above sound source detection method can detect a plurality of sound sources when the directions of the plurality of sound sources viewed from the microphone are different, but the directions of the plurality of sound sources viewed from the microphone are substantially the same. It is very difficult to detect a plurality of sound sources. In other words, with the detection method described above, it is very difficult to determine whether the sound arriving at the microphone from a certain direction is a sound from one sound source or a sound in which sounds from a plurality of sound sources overlap. For this reason, in the above detection method, the second sound source is generated in the same direction as the first sound source when viewed from the microphone during the period in which the first sound source is detected, or substantially the same as the first sound source. It is very difficult to detect that the second sound source existing in the direction of

１つの側面において、本発明は、一方向から音源が検出されているときの音源数の変化を検出することを目的とする。 In one aspect, an object of the present invention is to detect a change in the number of sound sources when sound sources are detected from one direction.

１つの態様の音源検出装置は、位相差算出部と、音源検出部と、を備える。位相差算出部は、マイクアレイ装置の複数のマイクから取得した複数の音声信号の周波数スペクトルに基づいて、音声信号における周波数成分毎の位相差を算出する。音源検出部は、算出した周波数成分毎の位相差に基づいて、音源が存在する方向及び音源数を検出する。音源検出部は、周波数成分数算出部と、位相差成分比率算出部と、遷移情報算出部と、を含む。周波数成分数算出部は、マイクに到来する音の到来方向に基づいて分割された複数の位相差領域と、算出した周波数成分毎の位相差とに基づいて、位相差領域毎の周波数成分数を算出する。位相差成分比率算出部は、算出した位相差領域毎の周波数成分数に基づいて、音声信号における各位相差領域の周波数成分数の比率を算出する。遷移情報算出部は、周波数成分数の比率の時間変化に基づいて、成分比率の遷移情報を算出する。当該音源検出部は、位相差領域毎の周波数成分数に基づいて音源が存在する位相差領域を決定するとともに、周波数成分数の比率の遷移情報に基づいて音源が存在する位相差領域内の音源数を決定する。 A sound source detection apparatus according to one aspect includes a phase difference calculation unit and a sound source detection unit. The phase difference calculation unit calculates a phase difference for each frequency component in the audio signal based on the frequency spectrum of the plurality of audio signals acquired from the plurality of microphones of the microphone array device. The sound source detection unit detects the direction in which the sound source exists and the number of sound sources based on the calculated phase difference for each frequency component. The sound source detection unit includes a frequency component number calculation unit, a phase difference component ratio calculation unit, and a transition information calculation unit. The frequency component number calculating unit calculates the number of frequency components for each phase difference region based on the plurality of phase difference regions divided based on the arrival direction of the sound arriving at the microphone and the phase difference for each calculated frequency component. calculate. The phase difference component ratio calculation unit calculates the ratio of the number of frequency components in each phase difference region in the audio signal based on the calculated number of frequency components for each phase difference region. The transition information calculation unit calculates component ratio transition information based on a temporal change in the ratio of the number of frequency components. The sound source detection unit determines a phase difference region in which a sound source exists based on the number of frequency components for each phase difference region, and a sound source in the phase difference region in which the sound source exists based on transition information of the ratio of the number of frequency components Determine the number.

上述の態様によれば、一方向から音源が検出されているときの音源数の変化を検出することが可能となる。 According to the above-described aspect, it is possible to detect a change in the number of sound sources when sound sources are detected from one direction.

第１の実施形態に係る音源検出装置の機構的構成を示す図である。It is a figure which shows the mechanical structure of the sound source detection apparatus which concerns on 1st Embodiment. 第１の実施形態に係る音源検出処理を説明するフローチャート（その１）である。It is a flowchart (the 1) explaining the sound source detection process which concerns on 1st Embodiment. 第１の実施形態に係る音源検出処理を説明するフローチャート（その２）である。It is a flowchart (the 2) explaining the sound source detection process which concerns on 1st Embodiment. 第１の実施形態に係る音源領域／音源数決定処理の内容を説明するフローチャート（その１）である。It is a flowchart (the 1) explaining the content of the sound source area | region / sound source number determination process which concerns on 1st Embodiment. 第１の実施形態に係る音源領域／音源数決定処理の内容を説明するフローチャート（その２）である。It is a flowchart (the 2) explaining the content of the sound source area | region / sound source number determination process which concerns on 1st Embodiment. 位相差領域の設定例を示す図である。It is a figure which shows the example of a setting of a phase difference area | region. 位相差のばらつきの例を示す図である。It is a figure which shows the example of the dispersion | variation in a phase difference. 音源領域の決定方法を説明する図である。It is a figure explaining the determination method of a sound source area. 音源領域の時間変化の例を示す図である。It is a figure which shows the example of the time change of a sound source area | region. 位相差成分比率の時間変化の例を示すグラフである。It is a graph which shows the example of the time change of a phase difference component ratio. 成分比率の区域の時間変化に基づく音源数の変化の判定方法を説明する図である。It is a figure explaining the determination method of the change of the number of sound sources based on the time change of the area of a component ratio. 検出結果の例を示す図である。It is a figure which shows the example of a detection result. 第２の実施形態に係る音源検出装置の機能的構成を示す図である。It is a figure which shows the functional structure of the sound source detection apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る位相スペクトル差を算出する処理の内容を説明するフローチャートである。It is a flowchart explaining the content of the process which calculates the phase spectrum difference which concerns on 2nd Embodiment. 第３の実施形態に係る音源検出装置の機能的構成を示す図である。It is a figure which shows the functional structure of the sound source detection apparatus which concerns on 3rd Embodiment. 第３の実施形態に係る検出結果を出力する処理の内容を説明するフローチャートである。It is a flowchart explaining the content of the process which outputs the detection result which concerns on 3rd Embodiment. 検出結果を出力する処理の変形例を説明するフローチャート（その１）である。It is a flowchart (the 1) explaining the modification of the process which outputs a detection result. 検出結果を出力する処理の変形例を説明するフローチャート（その２）である。It is a flowchart (the 2) explaining the modification of the process which outputs a detection result. 第４の実施形態に係る音源検出装置の機能的構成を示す図である。It is a figure which shows the functional structure of the sound source detection apparatus which concerns on 4th Embodiment. 第４の実施形態に係る検出結果を出力する処理の内容を説明するフローチャート（その１）である。It is a flowchart (the 1) explaining the content of the process which outputs the detection result which concerns on 4th Embodiment. 第４の実施形態に係る検出結果を出力する処理の内容を説明するフローチャート（その２）である。It is a flowchart (the 2) explaining the content of the process which outputs the detection result which concerns on 4th Embodiment. 第４の実施形態に係る検出結果を出力する処理の内容を説明するフローチャート（その３）である。It is a flowchart (the 3) explaining the content of the process which outputs the detection result which concerns on 4th Embodiment. 第４の実施形態に係る検出結果を出力する処理の内容を説明するフローチャート（その４）である。It is a flowchart (the 4) explaining the content of the process which outputs the detection result which concerns on 4th Embodiment. コンピュータのハードウェア構成を示す図である。It is a figure which shows the hardware constitutions of a computer.

［第１の実施形態］
図１は、第１の実施形態に係る音源検出装置の機構的構成を示す図である。 [First Embodiment]
FIG. 1 is a diagram illustrating a mechanical configuration of the sound source detection device according to the first embodiment.

図１に示すように、本実施形態に係る音源検出装置１は、音声信号受付部１０１と、変換部１０２と、位相差算出部１０３と、音源領域／音源数検出部１０４と、検出結果出力部１０５と、を備える。 As shown in FIG. 1, the sound source detection apparatus 1 according to the present embodiment includes an audio signal reception unit 101, a conversion unit 102, a phase difference calculation unit 103, a sound source region / sound source number detection unit 104, and a detection result output. Unit 105.

音声信号受付部１０１は、第１のマイクロフォン２０１及び第２のマイクロフォン２０２を含むマイクロフォンアレイ２からの音声信号の入力を受け付ける。以下、マイクロフォンのことを単にマイクという。また、音声信号受付部１０１は、第１のマイク２０１から入力された音声信号及び第２のマイク２０２から入力された時間領域の音声信号を、それぞれ、音源検出処理の処理単位（フレーム）に分割する。 The audio signal receiving unit 101 receives an input of an audio signal from the microphone array 2 including the first microphone 201 and the second microphone 202. Hereinafter, the microphone is simply referred to as a microphone. The audio signal receiving unit 101 divides the audio signal input from the first microphone 201 and the audio signal in the time domain input from the second microphone 202 into processing units (frames) for sound source detection processing, respectively. To do.

変換部１０２は、第１のマイク２０１及び第２のマイク２０２から入力された時間領域の音声信号をフレーム毎に周波数スペクトルに変換する。周波数スペクトルは、振幅スペクトルと位相スペクトルとを含む。以下、第１のマイク２０１から入力された音声信号についての周波数スペクトルを第１の周波数スペクトルともいい、第２のマイク２０２から入力された音声信号についての周波数スペクトルを第２の周波数スペクトルともいう。 The conversion unit 102 converts the time-domain audio signal input from the first microphone 201 and the second microphone 202 into a frequency spectrum for each frame. The frequency spectrum includes an amplitude spectrum and a phase spectrum. Hereinafter, the frequency spectrum for the audio signal input from the first microphone 201 is also referred to as a first frequency spectrum, and the frequency spectrum for the audio signal input from the second microphone 202 is also referred to as a second frequency spectrum.

位相差算出部１０３は、第１の周波数スペクトルにおける位相スペクトルと、第２の周波数スペクトルにおける位相スペクトルと、に基づいて、１フレーム分の周波数スペクトルにおける各周波数成分の位相スペクトル差を算出する。以下、位相スペクトル差を位相差ともいう。 The phase difference calculation unit 103 calculates the phase spectrum difference of each frequency component in the frequency spectrum for one frame based on the phase spectrum in the first frequency spectrum and the phase spectrum in the second frequency spectrum. Hereinafter, the phase spectrum difference is also referred to as a phase difference.

音源領域／音源数検出部１０４は、各周波数成分の位相差に基づいて、音源が存在する位相差領域及び音源の数を検出する。位相差領域は、マイクアレイに到来する音の到来方向を表す角度範囲と対応した、時間領域の音声信号の位相差範囲である。以下、音源が存在する位相差領域を音源領域ともいう。音源領域／音源数検出部１０４は、周波数成分数算出部１０４ａと、位相差成分比率算出部１０４ｂと、成分比率遷移情報算出部１０４ｃと、成分比率保持部１０４ｄと、を含む。 The sound source region / sound source number detection unit 104 detects the phase difference region where the sound source exists and the number of sound sources based on the phase difference of each frequency component. The phase difference region is a phase difference range of the audio signal in the time region corresponding to an angle range indicating the arrival direction of the sound arriving at the microphone array. Hereinafter, the phase difference region where the sound source exists is also referred to as a sound source region. The sound source region / sound source number detection unit 104 includes a frequency component number calculation unit 104a, a phase difference component ratio calculation unit 104b, a component ratio transition information calculation unit 104c, and a component ratio holding unit 104d.

検出結果出力部１０５は、音源領域／音源数検出部１０４の検出結果を外部装置等に出力する。 The detection result output unit 105 outputs the detection result of the sound source region / number of sound sources detection unit 104 to an external device or the like.

音源領域／音源数検出部１０４は、上記のように、周波数成分数算出部１０４ａと、位相差成分比率算出部１０４ｂと、成分比率遷移情報算出部１０４ｃと、成分比率保持部１０４ｄと、を含む。 As described above, the sound source region / sound source number detection unit 104 includes the frequency component number calculation unit 104a, the phase difference component ratio calculation unit 104b, the component ratio transition information calculation unit 104c, and the component ratio holding unit 104d. .

周波数成分算出部１０４ａは、１フレーム分の位相スペクトルにおける各周波数成分の位相差に基づいて、各位相差領域の周波数成分数を算出する。 The frequency component calculation unit 104a calculates the number of frequency components in each phase difference region based on the phase difference between the frequency components in the phase spectrum for one frame.

位相差成分比率算出部１０４ｂは、周波数成分数算出部１０４ａで算出した各位相差領域の周波数成分数に基づいて、１フレーム分の位相スペクトルにおける各位相差領域の周波数成分数の比率を算出する。以下、各位相差領域の周波数成分数の比率のことを「位相差の成分比率」又は「位相差成分比率」ともいう。 The phase difference component ratio calculation unit 104b calculates the ratio of the number of frequency components in each phase difference region in the phase spectrum for one frame based on the number of frequency components in each phase difference region calculated by the frequency component number calculation unit 104a. Hereinafter, the ratio of the number of frequency components in each phase difference region is also referred to as “phase difference component ratio” or “phase difference component ratio”.

成分比率遷移情報算出部１０４ｃは、現処理対象フレームにおける位相差の成分比率と、成分比率保持部１０４ｄで保持する過去の成分比率とに基づいて、各位相差領域の成分比率の遷移情報を算出する。 The component ratio transition information calculation unit 104c calculates the component ratio transition information of each phase difference region based on the phase difference component ratio in the current processing target frame and the past component ratio held by the component ratio holding unit 104d. .

成分比率保持部１０４ｄは、位相差成分比率算出部１０４ｂで算出した各位相差領域の成分比率を保持する。 The component ratio holding unit 104d holds the component ratio of each phase difference region calculated by the phase difference component ratio calculation unit 104b.

音源領域／音源数検出部１０４は、周波数成分数算出部１０４ａで算出した各位相差領域の周波数成分数に基づいて、音源領域を検出する。また、音源領域／音源数検出部１０４は、成分比率遷移情報算出部１４０ｃで算出した各位相差領域の成分比率の遷移情報に基づいて、各音源領域における音源数の変化を検出する。 The sound source region / sound source number detection unit 104 detects a sound source region based on the number of frequency components in each phase difference region calculated by the frequency component number calculation unit 104a. The sound source region / sound source number detection unit 104 detects a change in the number of sound sources in each sound source region based on the component ratio transition information calculated by the component ratio transition information calculation unit 140c.

図２Ａは、第１の実施形態に係る音源検出処理を説明するフローチャート（その１）である。図２Ｂは、第１の実施形態に係る音源検出処理を説明するフローチャート（その２）である。 FIG. 2A is a flowchart (part 1) for explaining sound source detection processing according to the first embodiment. FIG. 2B is a flowchart (part 2) illustrating the sound source detection process according to the first embodiment.

音源検出装置１は、音源検出処理を開始すると、図２Ａに示すように、まず、第１のマイク２０１及び第２のマイク２０２から音声信号を取得してフレームに分割する処理を開始する（ステップＳ１）。ステップＳ１の処理は、音声信号受付部１０１が行う。音声信号受付部１０１は、取得した各音声信号を、例えば３０ｍｓ程度のフレームに分割する。 When the sound source detection apparatus 1 starts the sound source detection process, as shown in FIG. 2A, first, the sound source detection apparatus 1 starts a process of acquiring audio signals from the first microphone 201 and the second microphone 202 and dividing them into frames (steps). S1). The audio signal receiving unit 101 performs the process in step S1. The audio signal reception unit 101 divides each acquired audio signal into frames of about 30 ms, for example.

次に、音源検出装置１は、音声信号のフレームの識別に用いる変数ｘを１に初期化する（ステップＳ２）。 Next, the sound source detection device 1 initializes a variable x used for identifying a frame of the audio signal to 1 (step S2).

次に、音源検出装置１は、フレームｘの音声信号を周波数スペクトルに変換する（ステップＳ３）。ステップＳ３の処理は、変換部１０２が行う。変換部１０２は、例えば、高速フーリエ変換により、時間領域の音声信号（フレーム）を周波数スペクトルに変換する。 Next, the sound source detection device 1 converts the audio signal of frame x into a frequency spectrum (step S3). The conversion unit 102 performs the process in step S3. The conversion unit 102 converts a time-domain audio signal (frame) into a frequency spectrum by, for example, fast Fourier transform.

次に、音源検出装置１は、周波数スペクトルにおける位相スペクトルに基づいて、各周波数成分の位相差を算出する（ステップＳ４）。ステップＳ４の処理は、位相差算出部１０３が行う。 Next, the sound source detection device 1 calculates the phase difference of each frequency component based on the phase spectrum in the frequency spectrum (step S4). The phase difference calculation unit 103 performs the process in step S4.

次に、音源検出装置１は、１フレーム分の各周波数成分の位相差に基づいて、各位相差領域の周波数成分数を算出する（ステップＳ５）。ステップＳ５の処理は、音源領域／音源数検出部１０４の周波数成分数算出部１０４ａが行う。位相差領域は、上記のように、音の到来方向を表す角度範囲と対応した、時間領域の音声信号の位相差範囲である。 Next, the sound source detection device 1 calculates the number of frequency components in each phase difference region based on the phase difference between each frequency component for one frame (step S5). The processing in step S5 is performed by the frequency component number calculation unit 104a of the sound source region / sound source number detection unit 104. As described above, the phase difference region is a phase difference range of the audio signal in the time domain corresponding to the angle range indicating the arrival direction of the sound.

次に、音源検出装置１は、周波数成分数が多い順で上位Ｎ個の位相差領域を抽出する（ステップＳ６）。ステップＳ６の処理は、音源領域／音源数検出部１０４が行う。ステップＳ６の処理において抽出する位相差領域の数Ｎは、２≦Ｎ＜Ｍ（Ｍは位相差領域の総数）の任意の整数とする。 Next, the sound source detection device 1 extracts the top N phase difference regions in descending order of the number of frequency components (step S6). The sound source area / sound source number detection unit 104 performs the process of step S6. The number N of phase difference regions extracted in the process of step S6 is an arbitrary integer of 2 ≦ N <M (M is the total number of phase difference regions).

次に、音源検出装置１は、抽出した位相差領域が隣接しているか否かを判定する（ステップＳ７）。ステップＳ７の判定は、音源領域／音源数検出部１０４が行う。抽出した位相差領域が隣接していない場合、音源領域／音源数検出部１０４は、抽出した位相差領域のそれぞれに音源が存在していると判定する。このため、抽出した位相差領域が隣接していない場合（ステップＳ７；Ｎｏ）、音源領域／音源数検出部１０４は、図２Ｂに示すように、フレームｘの音声信号から複数の音源が検出されたことを検出結果出力部１０５に通知する（ステップＳ８）。 Next, the sound source detection device 1 determines whether or not the extracted phase difference regions are adjacent (Step S7). The determination in step S7 is performed by the sound source area / number of sound sources detection unit 104. If the extracted phase difference regions are not adjacent, the sound source region / sound source number detection unit 104 determines that a sound source exists in each of the extracted phase difference regions. Therefore, when the extracted phase difference regions are not adjacent (step S7; No), the sound source region / sound source number detection unit 104 detects a plurality of sound sources from the audio signal of frame x as shown in FIG. 2B. This is notified to the detection result output unit 105 (step S8).

一方、位相差領域が隣接している場合（ステップＳ７；Ｙｅｓ）、音源検出装置１は、次に、音源領域／音源数決定処理（ステップＳ９）を行う。ステップＳ９の処理は、音源領域／音源数検出部１０４が行う。音源領域／音源数検出部１０４は、まず、抽出した位相差領域に基づいて音源領域を決定し、音源領域と各位相差領域の周波数成分数とに基づいて各位相差領域の成分比率を算出する。音源領域を決定し、各位相差領域の成分比率を算出する処理は、位相差成分比率算出部１０４ｂが行う。位相差成分比率算出部１０４ｂは、算出した各位相差領域の成分比率をフレームの番号ｘと対応付けて成分比率保持部１０４ｄに保持させる。また、音源領域／音源数検出部１０４は、各位相差領域の位相差成分比率の遷移情報（時間変化）を算出し、音源領域の位相差成分比率の変化量に基づいて、音源数が変化したか否かを判定する。位相差成分比率の遷移情報を算出する処理は、成分比率遷移情報算出部１０４ｃが行う。音源数が変化したか否かを判定した後、音源領域／音源数検出部１０４は、音源数の変化についての検出結果を検出結果出力部１０５に出力する。 On the other hand, when the phase difference regions are adjacent to each other (step S7; Yes), the sound source detection device 1 next performs a sound source region / sound source number determination process (step S9). The sound source region / sound source number detection unit 104 performs the process of step S9. The sound source region / sound source number detection unit 104 first determines a sound source region based on the extracted phase difference region, and calculates a component ratio of each phase difference region based on the sound source region and the number of frequency components in each phase difference region. The phase difference component ratio calculation unit 104b performs processing for determining the sound source area and calculating the component ratio of each phase difference area. The phase difference component ratio calculation unit 104b causes the component ratio holding unit 104d to hold the calculated component ratio of each phase difference region in association with the frame number x. The sound source region / sound source number detection unit 104 calculates transition information (time change) of the phase difference component ratio of each phase difference region, and the number of sound sources has changed based on the amount of change in the phase difference component ratio of the sound source region. It is determined whether or not. The process of calculating the phase difference component ratio transition information is performed by the component ratio transition information calculation unit 104c. After determining whether or not the number of sound sources has changed, the sound source region / sound source number detection unit 104 outputs a detection result regarding the change in the number of sound sources to the detection result output unit 105.

ステップＳ８又はＳ９の処理が終わると、音源検出装置１は、フレームｘの音声信号に対する音源の検出結果を出力する（ステップＳ１０）。ステップＳ１０の処理は、検出結果出力部１０５が行う。検出結果出力部１０５は、例えば、図１には示していないサーバや記憶装置等の外部装置に音源の検出結果を送信する。 When the process of step S8 or S9 ends, the sound source detection device 1 outputs a sound source detection result for the audio signal of frame x (step S10). The detection result output unit 105 performs the process of step S10. The detection result output unit 105 transmits the sound source detection result to an external device such as a server or a storage device not shown in FIG.

検出結果を出力した後、音源検出装置１は、検出処理を終了するか否かを判定する（ステップＳ１１）。音源検出装置１は、例えば、音源検出装置１の利用者（オペレータ）が図１には示していない入力装置等を用いて処理を終了する操作を行った場合に検出処理を終了する（ステップＳ１１；Ｙｅｓ）と判定する。検出処理を終了しない場合（ステップＳ１１；Ｎｏ）、音源検出装置１は、変数ｘをｘ＋１に更新し（ステップＳ１２）、ステップＳ３〜Ｓ１０の処理を繰り返す。 After outputting the detection result, the sound source detection device 1 determines whether or not to end the detection process (step S11). The sound source detection device 1 ends the detection process when, for example, a user (operator) of the sound source detection device 1 performs an operation to end the process using an input device or the like not shown in FIG. 1 (step S11). ; Yes). When the detection process is not terminated (step S11; No), the sound source detection device 1 updates the variable x to x + 1 (step S12) and repeats the processes of steps S3 to S10.

なお、図２Ａ及び図２Ｂに示したフローチャートは音源検出処理の一例に過ぎず、必要に応じて一部の処理を変更することや省略することが可能である。音源検出処理は、例えば、各フレームの音声信号に対するステップＳ３〜Ｓ１０の処理をパイプライン化して行ってもよい。 The flowcharts shown in FIGS. 2A and 2B are merely examples of sound source detection processing, and some of the processing can be changed or omitted as necessary. For example, the sound source detection processing may be performed by pipelining the processing of steps S3 to S10 on the audio signal of each frame.

本実施形態に係る音源検出処理における音源領域／音源数決定処理（ステップＳ９）は、上記のように音源領域／音源数検出部１０４が行う。音源領域／音源数検出部１０４は、ステップＳ９の処理として、図３Ａ及び図３Ｂに示したような処理を行う。 The sound source region / sound source number detection unit 104 performs the sound source region / sound source number determination processing (step S9) in the sound source detection processing according to the present embodiment as described above. The sound source region / sound source number detection unit 104 performs the processing shown in FIGS. 3A and 3B as the processing in step S9.

図３Ａは、第１の実施形態に係る音源領域／音源数決定処理の内容を説明するフローチャート（その１）である。図３Ｂは、第１の実施形態に係る音源領域／音源数決定処理の内容を説明するフローチャート（その２）である。 FIG. 3A is a flowchart (part 1) for explaining the contents of the sound source region / sound source number determination process according to the first embodiment. FIG. 3B is a flowchart (part 2) illustrating the content of the sound source region / sound source number determination process according to the first embodiment.

音源領域／音源数決定処理において、音源領域／音源数検出部１０４は、図３Ａに示すように、まず、ステップＳ６で抽出したＮ個の位相差領域における周波数成分数に基づいて、位相差の平均位置を算出する（ステップＳ９０１）。ステップＳ９０１の処理は、音源領域／音源数検出部１０４の位相差成分比率算出部１０４ｂが行う。位相差成分比率算出部１０４ｂは、例えば、下記式（１）を用いて、位相差平均位置ＰＰを算出する。 In the sound source region / sound source number determination process, as shown in FIG. 3A, the sound source region / sound source number detection unit 104 first calculates the phase difference based on the number of frequency components in the N phase difference regions extracted in step S6. An average position is calculated (step S901). The processing in step S901 is performed by the phase difference component ratio calculation unit 104b of the sound source region / number of sound sources detection unit 104. The phase difference component ratio calculation unit 104b calculates the phase difference average position PP using, for example, the following formula (1).

式（１）において、ＰＣ_ｎは抽出した位相差領域における位相差の中心値であり、ＦＦ_ｎは位相差領域内の周波数成分数である。 In Equation (1), PC _n is the center value of the phase difference in the extracted phase difference region, and FF _n is the number of frequency components in the phase difference region.

次に、位相差成分比率算出部１０４ｂは、算出した位相差平均位置ＰＰに基づいて、音源が存在する位相差領域（音源領域）を決定する（ステップＳ９０２）。ステップＳ９０２の処理において、位相差成分比率算出部１０４ｂは、位相差領域の中心値が位相差平均位置ＰＰに最も近い位相差領域を音源領域に決定する。 Next, the phase difference component ratio calculation unit 104b determines a phase difference region (sound source region) where a sound source exists based on the calculated phase difference average position PP (step S902). In the process of step S902, the phase difference component ratio calculation unit 104b determines a phase difference area whose center value of the phase difference area is closest to the phase difference average position PP as a sound source area.

次に、位相差成分比率算出部１０４ｂは、フレームｘの位相スペクトルにおける各位相差領域の位相差成分比率を算出する（ステップＳ９０３）。ステップＳ９０３の処理において、位相差成分比率算出部１０４ｂは、例えば、下記式（２）を用いて、位相差成分比率Ｒ（ｊ，ｘ）を算出する。 Next, the phase difference component ratio calculation unit 104b calculates the phase difference component ratio of each phase difference region in the phase spectrum of the frame x (step S903). In the process of step S903, the phase difference component ratio calculation unit 104b calculates the phase difference component ratio R (j, x) using, for example, the following equation (2).

式（２）の変数ｊは、位相差領域を識別する値である。また、式（２）のαは、０＜α＜１を満たす任意の値である。フレームｘ−１の位相スペクトルにおける位相差成分比率Ｒ（ｊ，ｘ−１）は正の値である。そのため、式（２）を用いて位相差成分比率Ｒ（ｊ，ｘ）を算出した場合、音源領域の位相差成分比率Ｒ（ｊ，ｘ）は１．０に近づき、算出される非音源領域の位相差成分比率Ｒ（ｊ，ｘ）は０．０に近づく。なお、フレームの番号ｘが１の場合、位相差成分比率Ｒ（ｊ，ｘ−１）＝Ｒ（ｊ，０）には予め定めた所定の値を用いる。 The variable j in Equation (2) is a value that identifies the phase difference region. Further, α in the formula (2) is an arbitrary value satisfying 0 <α <1. The phase difference component ratio R (j, x−1) in the phase spectrum of the frame x−1 is a positive value. Therefore, when the phase difference component ratio R (j, x) is calculated using Expression (2), the phase difference component ratio R (j, x) of the sound source region approaches 1.0, and the calculated non-sound source region The phase difference component ratio R (j, x) approaches 0.0. When the frame number x is 1, a predetermined value is used as the phase difference component ratio R (j, x−1) = R (j, 0).

また、ステップＳ９０３の処理では、位相差成分比率算出部１０４ｂは、算出した位相差成分比率Ｒ（ｊ，ｘ）を成分比率保持部１０４ｄに保持（記憶）させる。 In the process of step S903, the phase difference component ratio calculation unit 104b holds (stores) the calculated phase difference component ratio R (j, x) in the component ratio holding unit 104d.

ステップＳ９０３の処理を終えると、音源領域／音源数検出部１０４は、次に、各位相差領域の位相差成分比率の時間変化を調べる（ステップＳ９０４）。ステップＳ９０４の処理は、成分比率遷移情報算出部１０４ｃが行う。成分比率遷移情報算出部１０４ｃは、フレームｘ−１以前の所定フレーム数分の位相差成分比率を成分比率保持部１０４ｄから読み出して、フレームｘまでの各位相差領域の位相差成分比率の時間変化を調べる。 When the process of step S903 is completed, the sound source region / sound source number detection unit 104 next checks the temporal change of the phase difference component ratio of each phase difference region (step S904). The process of step S904 is performed by the component ratio transition information calculation unit 104c. The component ratio transition information calculation unit 104c reads out the phase difference component ratios for a predetermined number of frames before the frame x-1 from the component ratio holding unit 104d, and changes the time difference of the phase difference component ratio of each phase difference region up to the frame x. Investigate.

次に、音源領域／音源数検出部１０４は、音源領域の位相差成分比率に所定の変化量を超える時間変化があるか否かを判定する（ステップＳ９０５）。音源領域の位相差成分比率の変化が所定の変化量以下である場合（ステップＳ９０５；Ｎｏ）、音源領域／音源数検出部１０４は、音源数に変化なしと判定する。この場合、音源領域／音源数検出部１０４は、音源数に変化がないことを検出結果出力部１０５に通知し（ステップＳ９０６）、図３Ｂに示すように音源領域／音源数決定処理を終了する（リターン）。 Next, the sound source region / sound source number detection unit 104 determines whether or not there is a time change exceeding a predetermined change amount in the phase difference component ratio of the sound source region (step S905). When the change in the phase difference component ratio of the sound source region is equal to or less than the predetermined change amount (step S905; No), the sound source region / sound source number detection unit 104 determines that there is no change in the number of sound sources. In this case, the sound source region / sound source number detection unit 104 notifies the detection result output unit 105 that there is no change in the number of sound sources (step S906), and ends the sound source region / sound source number determination process as shown in FIG. 3B. (return).

一方、音源領域の位相差成分比率に所定量を越える変化がある場合（ステップＳ９０５；Ｙｅｓ）、音源領域／音源数検出部１０４は、図３Ｂに示すように、次に、音源領域の位相差成分比率が増加したか否かを判定する（ステップＳ９０７）。音源領域の位相差成分比率が増加した場合（ステップＳ９０７；Ｙｅｓ）、音源領域／音源数検出部１０４は、音源領域の音源数が増加したことを検出結果出力部１０５に通知し（ステップＳ９０８）、音源領域／音源数決定処理を終了する。また、音源領域の位相差成分比率が減少した場合（ステップＳ９０７；Ｎｏ）、音源領域／音源数検出部１０４は、音源領域の音源数が減少したことを検出結果出力部１０５に通知し（ステップＳ９０９）、音源領域／音源数決定処理を終了する。 On the other hand, when there is a change exceeding the predetermined amount in the phase difference component ratio of the sound source region (step S905; Yes), the sound source region / sound source number detection unit 104 next performs the phase difference of the sound source region as shown in FIG. 3B. It is determined whether or not the component ratio has increased (step S907). When the phase difference component ratio of the sound source region increases (step S907; Yes), the sound source region / sound source number detection unit 104 notifies the detection result output unit 105 that the number of sound sources in the sound source region has increased (step S908). Then, the sound source area / sound source number determination process is terminated. When the phase difference component ratio of the sound source region decreases (step S907; No), the sound source region / sound source number detection unit 104 notifies the detection result output unit 105 that the number of sound sources in the sound source region has decreased (step S907). S909), the sound source region / sound source number determination process is terminated.

このように、本実施形態に係る音源領域／音源数決定処理では、各位相差領域の周波数成分数に基づいて音源領域を決定し、音源領域と各位相差領域の周波数成分数とに基づいて、１フレーム分の位相スペクトルにおける各位相差領域の成分比率を算出する。この成分比率は、周波数成分数の多い位相差領域ほど大きな値となる。また、本実施形態に係る音源領域／音源数決定処理では、上記のように、音源領域の成分比率が１．０に近づき、非音源領域の成分比率が０．０に近づくよう各位相差領域の成分比率を算出する。 Thus, in the sound source region / sound source number determination process according to the present embodiment, a sound source region is determined based on the number of frequency components in each phase difference region, and 1 based on the number of frequency components in the sound source region and each phase difference region. The component ratio of each phase difference region in the phase spectrum for the frame is calculated. This component ratio becomes larger as the phase difference region has a larger number of frequency components. In the sound source region / sound source number determination process according to the present embodiment, as described above, the component ratio of the sound source region approaches 1.0 and the component ratio of the non-sound source region approaches 0.0 so that the component ratio of the non-sound source region approaches 0.0. The component ratio is calculated.

音源領域に新たな音源が生じた場合、音源領域から到来する音に新たな音源からの音が加わるため、音源領域の周波数成分数は増加する。これに対し、音源領域以外の位相差領域（非音源領域）における周波数成分数は、音源領域に新たな音源が生じた時刻の前後でほとんど変わらない。そのため、音源領域に新たな音源が生じた場合、音源領域の位相差成分比率が大きくなる。よって、音源領域の位相差成分比率の増加量が所定の時間変化量を超える場合、音源領域に新たな音源が生じたと判定することができる。また、音源領域の音源が減少した場合は、逆に音源領域の位相差成分比率が小さくなる。そのため、音源領域の位相差成分比率の減少量が所定の時間変化量を超える場合、音源領域の音源数が減少したと判定することができる。 When a new sound source is generated in the sound source region, since the sound from the new sound source is added to the sound coming from the sound source region, the number of frequency components in the sound source region is increased. In contrast, the number of frequency components in the phase difference region (non-sound source region) other than the sound source region hardly changes before and after the time when a new sound source is generated in the sound source region. Therefore, when a new sound source is generated in the sound source region, the phase difference component ratio of the sound source region is increased. Therefore, when the increase amount of the phase difference component ratio in the sound source region exceeds the predetermined time change amount, it can be determined that a new sound source has occurred in the sound source region. On the other hand, when the number of sound sources in the sound source area decreases, the phase difference component ratio in the sound source area decreases. Therefore, when the amount of decrease in the phase difference component ratio in the sound source region exceeds a predetermined time change amount, it can be determined that the number of sound sources in the sound source region has decreased.

図４は、位相差領域の設定例を示す図である。
本実施形態の音源検出装置１は、例えば、図４に示すように、設置面３に距離ｄで配設された第１のマイク２０１及び第２のマイク２０２からの音声信号に基づいて、音源の方向や音源数を検出する。この際、音源の方向は、第１のマイク２０１及び第２のマイク２０２で収音可能な音の到来方向の角度範囲を複数の小角度範囲に分割し、複数の小角度範囲のいずれに音源が存在するかにより検出する。例えば、図４に示すように、第１のマイク２０１と第２のマイク２０２との中点Ｐを通る設置面３の法線方向をθ＝０（度）とする−９０≦θ≦９０（度）の角度範囲を、７個の小角度範囲ＰＡ１〜ＰＡ７に分割する。そして、第１のマイク２０１で収音した音声信号と第２のマイク２０２で収音した音声信号との位相差に基づいて、７個の小角度範囲ＰＡ１〜ＰＡ７のいずれに音源が存在するかを決定する。 FIG. 4 is a diagram illustrating a setting example of the phase difference region.
The sound source detection device 1 according to the present embodiment, for example, as illustrated in FIG. 4, based on audio signals from the first microphone 201 and the second microphone 202 disposed on the installation surface 3 at a distance d. Detect the direction and number of sound sources. At this time, the direction of the sound source is divided into a plurality of small angle ranges in the angle range of the arrival direction of the sound that can be collected by the first microphone 201 and the second microphone 202, and the sound source is divided into any of the plurality of small angle ranges. Detects whether or not exists. For example, as shown in FIG. 4, −90 ≦ θ ≦ 90 (the normal direction of the installation surface 3 passing through the midpoint P between the first microphone 201 and the second microphone 202 is θ = 0 (degrees). Is divided into seven small angle ranges PA1 to PA7. Based on the phase difference between the audio signal collected by the first microphone 201 and the audio signal collected by the second microphone 202, which of the seven small angle ranges PA1 to PA7 has a sound source? To decide.

第１のマイク２０１で収音した音声信号と第２のマイク２０２で収音した音声信号とには、第１のマイク２０１及び第２のマイク２０２から見た音源の方向に応じた位相差が生じる。第１のマイク２０１及び第２のマイク２０２から音源までの距離は、第１のマイク２０１と第２のマイク２０２との距離（マイク間距離）ｄに比べて十分に長い。そのため、音源から第１のマイク２０１に向かう音の経路と、音源から第２のマイク２０２に向かう音の経路とは、略平行となる。従って、ある時刻に音源の発した音が第１のマイク２０１に到達する時刻Ｔ１と、その音が第２のマイク２０２に到達する時刻Ｔ２との時間差Δｔ（＝Ｔ１−Ｔ２）は、Δｔ＝（ｄ・ｓｉｎθ）／ｃで近似される。ここで、θは、第１のマイク２０１と第２のマイク２０２との中点Ｐを通る設置面３の法線方向と、中点Ｐから見た音源の方向とのなす角である。また、ｃは音速である。 The audio signal collected by the first microphone 201 and the audio signal collected by the second microphone 202 have a phase difference corresponding to the direction of the sound source viewed from the first microphone 201 and the second microphone 202. Arise. The distance from the first microphone 201 and the second microphone 202 to the sound source is sufficiently longer than the distance (inter-microphone distance) d between the first microphone 201 and the second microphone 202. Therefore, the sound path from the sound source to the first microphone 201 and the sound path from the sound source to the second microphone 202 are substantially parallel. Accordingly, the time difference Δt (= T1−T2) between the time T1 when the sound emitted from the sound source reaches the first microphone 201 and the time T2 when the sound reaches the second microphone 202 is Δt = It is approximated by (d · sin θ) / c. Here, θ is an angle formed by the normal direction of the installation surface 3 passing through the midpoint P between the first microphone 201 and the second microphone 202 and the direction of the sound source viewed from the midpoint P. C is the speed of sound.

すなわち、第１のマイク２０１で収音した音声信号と、第２のマイク２０２で収音した音声信号とには、時間差Δｔに相当する位相差が生じる。時間差Δｔには、第１のマイク２０１及び第２のマイク２０２から見た音源の方向（角度θ）との相関がある。第１のマイク２０１で収音した音声信号と、第２のマイク２０２で収音した音声信号との位相差にも、第１のマイク２０１及び第２のマイク２０２から見た音源の方向（角度θ）との相関がある。そのため、本明細書では、音源の方向の決定に用いる小角度範囲ＰＡ１〜ＰＡ７のことを位相差領域という。 That is, there is a phase difference corresponding to the time difference Δt between the audio signal collected by the first microphone 201 and the audio signal collected by the second microphone 202. The time difference Δt has a correlation with the direction (angle θ) of the sound source viewed from the first microphone 201 and the second microphone 202. The direction (angle) of the sound source viewed from the first microphone 201 and the second microphone 202 is also determined by the phase difference between the audio signal collected by the first microphone 201 and the audio signal collected by the second microphone 202. θ). Therefore, in this specification, the small angle ranges PA1 to PA7 used for determining the direction of the sound source are referred to as a phase difference region.

ここで、図４に示したように、位相差領域ＰＡ４に第１の音源４０１が存在する場合を考える。第１のマイク２０１で収音した音声信号と、第２のマイク２０２で収音した音声信号との位相差は、第１のマイク２０１及び第２のマイク２０２から見た第１の音源４０１の方向との相関がある。従って、第１のマイク２０１で収音した音声信号と、第２のマイク２０２で収音した音声信号との位相差に基づいて、第１の音源４０１の方向を検出することができる。 Here, consider the case where the first sound source 401 exists in the phase difference area PA4 as shown in FIG. The phase difference between the audio signal collected by the first microphone 201 and the audio signal collected by the second microphone 202 is the same as that of the first sound source 401 viewed from the first microphone 201 and the second microphone 202. There is a correlation with direction. Therefore, the direction of the first sound source 401 can be detected based on the phase difference between the audio signal collected by the first microphone 201 and the audio signal collected by the second microphone 202.

また、第１の音源４０１が音を発しているときに位相差領域ＰＡ４に第２の音源４０２が発生した場合、本実施形態の音源検出方法では、位相差領域ＰＡ４の位相差成分比率の遷移情報に基づいて、第２の音源４０２の発生を検出することができる。 Further, when the second sound source 402 is generated in the phase difference area PA4 while the first sound source 401 is emitting sound, in the sound source detection method of the present embodiment, the transition of the phase difference component ratio of the phase difference area PA4 is performed. Generation of the second sound source 402 can be detected based on the information.

なお、図４に示した例では、−９０≦θ≦９０（度）の角度範囲を７個の位相差領域ＰＡ１〜ＰＡ７に分割しているが、これに限らず、位相差領域の分割数は、所望の検出精度に応じて適宜設定すればよい。また、音の到来方向を表す角度範囲は、−９０≦θ≦９０（度）に限らず、使用する第１のマイク２０１及び第２のマイク２０２の指向性や所望の音源検出範囲に応じて適宜変更可能である。 In the example shown in FIG. 4, the angle range of −90 ≦ θ ≦ 90 (degrees) is divided into seven phase difference areas PA1 to PA7. May be set as appropriate according to the desired detection accuracy. In addition, the angle range representing the sound arrival direction is not limited to −90 ≦ θ ≦ 90 (degrees), but depends on the directivity of the first microphone 201 and the second microphone 202 to be used and the desired sound source detection range. It can be changed as appropriate.

図５は、位相差のばらつきの例を示す図である。
第１のマイク２０１及び第２のマイク２０２で収音する音は、複数の周波数成分を含む場合が多い。そして、ある角度θ（≠０度）から第１のマイク２０１及び第２のマイク２０２に到来する音の位相差は、音の周波数により異なる。そのため、音声信号の位相差に基づいて音源の方向を検出する場合、各音声信号の周波数スペクトル（位相スペクトル）を求め、周波数成分毎の位相スペクトル差に基づいて音源の方向を検出する。 FIG. 5 is a diagram illustrating an example of variation in phase difference.
The sound collected by the first microphone 201 and the second microphone 202 often includes a plurality of frequency components. Then, the phase difference between sounds arriving at the first microphone 201 and the second microphone 202 from a certain angle θ (≠ 0 degrees) differs depending on the frequency of the sound. Therefore, when detecting the direction of the sound source based on the phase difference of the audio signal, the frequency spectrum (phase spectrum) of each audio signal is obtained, and the direction of the sound source is detected based on the phase spectrum difference for each frequency component.

位相スペクトルにおける周波数成分（周波数ｂｉｎ）と、位相スペクトル差（位相差）との関係は、例えば、図５に示した散布図で表すことができる。なお、図５に示した散布図には、収音した音声信号における所定の周波数範囲（例えば、０〜４０００Ｈｚ）を１２８成分に分割した例を示している。 The relationship between the frequency component (frequency bin) in the phase spectrum and the phase spectrum difference (phase difference) can be expressed by, for example, the scatter diagram shown in FIG. The scatter diagram shown in FIG. 5 shows an example in which a predetermined frequency range (for example, 0 to 4000 Hz) in the collected audio signal is divided into 128 components.

第１のマイク２０１及び第２のマイク２０２から見て角度θｙの方向に音源が存在する場合、理論上、各周波数成分の位相差は、図５に示した太い直線で表すことができる。しかしながら、実際には、音源の周囲に存在する物体で反射した音（反響音）等の影響により、図５に黒丸で示したように、各周波数成分における位相差にばらつきが生じる。そのため、本実施形態の音源検出処理では、図２Ａに示したように、各周波数成分の位相差に基づいて、各位相差領域ＰＡ１〜ＰＡ７に存在する位相差の周波数成分の数を算出する（ステップＳ４）。そして、各位相差領域の周波数成分数に基づいて、音源が存在する位相差領域を決定する（ステップＳ６〜Ｓ９）。例えば、図５に示した散布図に上記の位相差領域ＰＡ１〜ＰＡ７を重ね、１フレーム分の位相スペクトルにおける各周波数成分の位相差が位相差領域ＰＡ１〜ＰＡ７のいずれの領域に存在するかを計数する。そして、周波数成分の数が多い位相差領域に基づいて音源が存在する位相差領域（音源領域）を決定する。例えば、図５に示した例では、位相差が位相差領域ＰＡ６に存在する周波数成分の数が最も多くなる。 When the sound source exists in the direction of the angle θy when viewed from the first microphone 201 and the second microphone 202, the phase difference of each frequency component can theoretically be represented by the thick straight line shown in FIG. However, in reality, the phase difference in each frequency component varies as shown by the black circles in FIG. 5 due to the influence of sound (echoic sound) reflected by an object existing around the sound source. Therefore, in the sound source detection process of the present embodiment, as shown in FIG. 2A, the number of frequency components of the phase difference existing in each of the phase difference areas PA1 to PA7 is calculated based on the phase difference of each frequency component (step S4). Then, based on the number of frequency components in each phase difference region, the phase difference region where the sound source exists is determined (steps S6 to S9). For example, the phase difference areas PA1 to PA7 are overlaid on the scatter diagram shown in FIG. 5, and in which of the phase difference areas PA1 to PA7 the phase difference of each frequency component in the phase spectrum for one frame exists. Count. Then, a phase difference region (sound source region) in which a sound source exists is determined based on a phase difference region having a large number of frequency components. For example, in the example shown in FIG. 5, the number of frequency components having a phase difference in the phase difference area PA6 is the largest.

図６は、音源領域の決定方法を説明する図である。図６における位相差の中心値は、各位相差領域の中心値であり、図６では周波数が最も高い周波数成分における各位相差領域の中心値としている。 FIG. 6 is a diagram for explaining a method of determining a sound source region. The center value of the phase difference in FIG. 6 is the center value of each phase difference region. In FIG. 6, the center value of each phase difference region in the frequency component having the highest frequency is used.

１フレーム分の時間領域の音声信号を周波数スペクトルに変換する際には、例えば、０〜４０００ｋＨｚの周波数範囲を１２８の周波数成分に分割する。この１２８個の周波数成分について位相差が７個の位相差領域ＰＡ１〜ＰＡ７のいずれの領域に含まれるかを計数すると、例えば、図６に示すような計数結果が得られる。なお、図６に示した例では周波数成分数の合計が１００となるが、これは、上記の位相差のばらつきの影響で位相差が位相差領域ＰＡ１〜ＰＡ７のいずれの領域にも含まれない周波数成分が存在すること等によるものである。 When converting a time-domain audio signal for one frame into a frequency spectrum, for example, a frequency range of 0 to 4000 kHz is divided into 128 frequency components. When the phase difference of the 128 frequency components is included in any of the seven phase difference areas PA1 to PA7, for example, a counting result as shown in FIG. 6 is obtained. In the example shown in FIG. 6, the total number of frequency components is 100, but this is not included in any of the phase difference regions PA1 to PA7 due to the influence of the above-described variation in phase difference. This is due to the presence of frequency components.

各位相差領域の周波数成分数に基づいて、音源が存在する位相差領域を決定する場合、まず、周波数成分数が多い順で上位Ｎ個の位相差領域を抽出する（ステップＳ６）。Ｎ＝２の場合、図６に示した計数結果からは位相差領域ＰＡ４，ＰＡ５が抽出される。ここで抽出された位相差領域ＰＡ４及びＰＡ５は隣接しているので、音源検出装置１は、次に、音源領域／音源数決定処理（ステップＳ９）を行う。音源検出装置１は、音源領域／音源数決定処理として、図３Ａ及び図３Ｂに示したような処理を行う。 When determining the phase difference region where the sound source exists based on the number of frequency components in each phase difference region, first, the top N phase difference regions are extracted in descending order of the number of frequency components (step S6). When N = 2, phase difference areas PA4 and PA5 are extracted from the counting result shown in FIG. Since the extracted phase difference areas PA4 and PA5 are adjacent to each other, the sound source detection device 1 next performs a sound source area / number of sound sources determination process (step S9). The sound source detection device 1 performs processing as shown in FIGS. 3A and 3B as sound source region / sound source number determination processing.

音源領域／音源数決定処理では、まず、抽出した位相差領域ＰＡ４，ＰＡ５から位相差平均位置を求める（ステップＳ９０１）。位相差平均位置は、式（１）を用いて算出する。図６に示した例において、位相差領域ＰＡ４の中心値ＰＣ及び周波数成分数ＦＦは、それぞれ０．０及び２３である。また、位相差領域ＰＡ５の中心値ＰＣ及び周波数成分数ＦＦは、それぞれ１．０及び１８である。従って、位相差平均位置ＰＰは、下記式（３）のように約０．４となる。 In the sound source region / sound source number determination process, first, a phase difference average position is obtained from the extracted phase difference regions PA4 and PA5 (step S901). The phase difference average position is calculated using Equation (1). In the example shown in FIG. 6, the center value PC and the number of frequency components FF of the phase difference area PA4 are 0.0 and 23, respectively. The center value PC and the frequency component number FF of the phase difference area PA5 are 1.0 and 18, respectively. Accordingly, the phase difference average position PP is about 0.4 as shown in the following formula (3).

ＰＰ＝（０．０×２３＋１．０×１８）／（２３＋１８）
＝１８／４１
≒０．４３９≒０．４・・・（３） PP = (0.0 × 23 + 1.0 × 18) / (23 + 18)
= 18/41
≒ 0.439 ≒ 0.4 (3)

式（３）により算出した平均位置ＰＰ＝０．４に最も近い位相差領域の中心値は、位相差領域ＰＡ４の中心値０．０である。よって、ステップＳ９０２の処理において、位相差成分比率算出部１０４ｂは、位相差領域ＰＡ４を音源が存在する領域（音源領域）に決定する。 The center value of the phase difference region closest to the average position PP = 0.4 calculated by the equation (3) is the center value 0.0 of the phase difference region PA4. Therefore, in the process of step S902, the phase difference component ratio calculation unit 104b determines the phase difference area PA4 as an area where the sound source exists (sound source area).

なお、図示は省略するが、周波数成分数が多い順で上位２個の位相差領域を抽出した場合には、例えば、位相差領域ＰＡ４及びＰＡ６のように、隣接しない位相差領域が抽出されることもある。このような場合、本実施形態に係る音源検出処理では、位相差領域ＰＡ４及びＰＡ６のそれぞれに音源が存在すると判定するので、２個の位相差領域ＰＡ４及びＰＡ６が音源領域となる。 Although illustration is omitted, when the top two phase difference areas are extracted in the order of the number of frequency components, non-adjacent phase difference areas such as phase difference areas PA4 and PA6 are extracted, for example. Sometimes. In such a case, in the sound source detection processing according to the present embodiment, since it is determined that a sound source exists in each of the phase difference areas PA4 and PA6, the two phase difference areas PA4 and PA6 become sound source areas.

こうして、フレーム毎に周波数成分数が多い位相差領域のなかから音源領域を決定すると、図７に示したような音源領域の時間変化を表す情報が得られる。 In this way, when the sound source region is determined from the phase difference regions having a large number of frequency components for each frame, information representing the time change of the sound source region as shown in FIG. 7 is obtained.

図７は、音源領域の時間変化の例を示す図である。
音源が位相差領域ＰＡ４内であり、かつ位相差領域ＰＡ５に近い位置に存在する場合、ステップＳ６，ステップＳ９０１及びＳ９０２の処理により各フレームにおける音源領域を決定すると、例えば、図７に示したような結果が得られる。しかしながら、図７に示した音源領域の時間変化からは、位相差領域ＰＡ４に音源があると推定することはできても、位相差領域ＰＡ４に存在する音源数の変化の有無等まではわからない。そこで、本実施形態の音源検出装置１では、各位相差領域の成分比率（すなわち１フレーム分の周波数成分数における各位相差領域の周波数成分数の比率）を算出し、成分比率の時間変化から音源数の変化等を判定する（ステップＳ９０３〜Ｓ９０９）。各位相差成分比率は、式（２）を用いて算出する。 FIG. 7 is a diagram illustrating an example of a time change of the sound source region.
When the sound source is in the phase difference area PA4 and is present at a position close to the phase difference area PA5, the sound source area in each frame is determined by the processing in steps S6, S901, and S902. For example, as shown in FIG. Results. However, although it can be estimated from the time change of the sound source area shown in FIG. 7 that there is a sound source in the phase difference area PA4, it is not known whether there is a change in the number of sound sources existing in the phase difference area PA4. Therefore, in the sound source detection device 1 of the present embodiment, the component ratio of each phase difference region (that is, the ratio of the number of frequency components in each phase difference region in the number of frequency components for one frame) is calculated, and the number of sound sources is calculated from the time change of the component ratio. Is determined (steps S903 to S909). Each phase difference component ratio is calculated using Equation (2).

図８は、位相差成分比率の時間変化の例を示すグラフである。
図８のグラフは、音源とマイクとの位置関係が図３に示した関係である場合に時刻ｔ２〜ｔ５に収音した音声信号に基づいて算出した位相差成分比率の時間変化の例を示している。なお、第１の音源４０１は時刻ｔ２〜ｔ５の全区間で音を発しており、第２の音源４０２は時刻ｔ３〜ｔ４の区間のみで音を発している。 FIG. 8 is a graph illustrating an example of a temporal change in the phase difference component ratio.
The graph of FIG. 8 shows an example of the time change of the phase difference component ratio calculated based on the audio signal picked up at times t2 to t5 when the positional relationship between the sound source and the microphone is the relationship shown in FIG. ing. Note that the first sound source 401 emits sound in the entire section from time t2 to t5, and the second sound source 402 emits sound only in the section from time t3 to t4.

時刻ｔ２〜ｔ３の区間では、位相差領域ＰＡ４内であり、かつ位相差領域ＰＡ５に近い位置に存在する第１の音源４０１のみが音を発している。この場合、図６に示したように、位相差領域ＰＡ４，ＰＡ５の周波数成分数が、他の位相差領域の周波数成分数に比べて多くなる。そのため、時刻ｔ２〜ｔ３の区間における音源領域の時間変化は、図７に示した時刻ｔ０〜ｔ１のような時間変化となる。すなわち、時刻ｔ２〜ｔ３の区間では、位相差領域ＰＡ４又はＰＡ５が音源領域に特定されたフレームが大半を占めており、他の位相差領域が音源領域に特定されたフレームはわずかである。また、時刻ｔ２〜ｔ３の区間では、位相差領域ＰＡ４が音源領域に特定される頻度が最も多い。従って、式（２）を用いて算出される時刻ｔ２〜ｔ３の区間における位相差成分比率は、当該区間の大半で位相差領域ＰＡ４の位相差成分比率Ｒ（ＰＡ４）が最も高くなる。 In the period from time t2 to t3, only the first sound source 401 that is in the phase difference area PA4 and located near the phase difference area PA5 emits sound. In this case, as shown in FIG. 6, the number of frequency components in the phase difference areas PA4 and PA5 is larger than the number of frequency components in the other phase difference areas. Therefore, the time change of the sound source region in the section from time t2 to time t3 is the time change from time t0 to time t1 shown in FIG. That is, in the period from time t2 to t3, the frame in which the phase difference area PA4 or PA5 is specified as the sound source area occupies the majority, and the number of frames in which the other phase difference areas are specified as the sound source area is small. In the section from time t2 to t3, the phase difference area PA4 is most frequently specified as the sound source area. Thus, the phase difference component ratio R (PA4) of the phase difference area PA4 is the highest in the most part of the section of the time difference t2 to t3 calculated using the equation (2).

次に、時刻ｔ３〜ｔ４の区間では、位相差領域ＰＡ４内であり、かつ位相差領域ＰＡ５に近い位置に存在する第１の音源４０１及び第２の音源４０２が音を発している。そのため、この場合も、音源検出装置１が特定した音源領域の時間変化は、図７に示したような変化となる。従って、時刻ｔ３〜ｔ４の区間における位相差成分比率は、位相差領域ＰＡ４の成分比率Ｒ（ＰＡ４）及び位相差領域ＰＡ５の成分比率Ｒ（ＰＡ５）が他の位相差領域の成分比率よりも大きくなる。 Next, in the period from time t3 to t4, the first sound source 401 and the second sound source 402 that are in the phase difference area PA4 and located near the phase difference area PA5 emit sound. Therefore, also in this case, the time change of the sound source region specified by the sound source detection device 1 is the change shown in FIG. Accordingly, the phase difference component ratio in the section from time t3 to t4 is such that the component ratio R (PA4) of the phase difference area PA4 and the component ratio R (PA5) of the phase difference area PA5 are larger than the component ratios of the other phase difference areas. Become.

しかしながら、時刻ｔ３〜ｔ４の区間では、第２の音源４０２も音を発している。また、第２の音源４０２は、第１の音源４０１よりも第１のマイク２０１及び第２のマイク２０２に近い位置にある。そのため、第１のマイク２０１及び第２のマイク２０２で収音した音声信号における第２の音源４０２が発した音の反響音の影響は、第１の音源４０１が発した音の反響音の影響よりも小さい。従って、第１のマイク２０１及び第２のマイク２０２で収音した音声信号に含まれる第２の音源４０２が発した音の各周波数成分は、第１の音源４０１が発した音の各周波数成分に比べて位相差のばらつきが小さくなる。すなわち、第１の音源４０１及び第２の音源４０２が音を発している場合、第１の音源４０１のみが音を発している場合に比べて位相差領域ＰＡ４の周波数成分数が多くなる。そのため、時刻ｔ３〜ｔ４の区間では、図８に示したように、第１の音源４０１のみが音を発している時刻ｔ２〜ｔ３の区間に比べて位相差領域ＰＡ４の位相差成分比率Ｒ（ＰＡ４）が上昇する。よって、音源領域に特定された位相差領域の位相差成分比率の増加量が所定の変化量を超えた場合、その音源領域の音源数が増加したと判定することができる。特に、雑音環境下にいる人物の発声量は、ロンバート効果により、雑音量が大きくなると発声量も大きくなる傾向がある。そのため、例えば、第１の音源４０１がテレビであり、第２の音源４０２が人物である場合、第１の音源４０１の音量の変化が位相差成分比率に与える影響は小さい。よって、本実施形態に係る音源検出方法によれば、第１の音源４０１が音を発している状況下において第１の音源４０１と略同一方向に第２の音源４０２が生じた場合にも、第２の音源４０２が生じたことを検出することが可能となる。 However, in the section from time t3 to t4, the second sound source 402 also emits sound. Further, the second sound source 402 is located closer to the first microphone 201 and the second microphone 202 than the first sound source 401. Therefore, the influence of the reverberation sound of the sound emitted by the second sound source 402 on the sound signals collected by the first microphone 201 and the second microphone 202 is the influence of the reverberation sound of the sound emitted by the first sound source 401. Smaller than. Therefore, each frequency component of the sound emitted by the second sound source 402 included in the sound signal collected by the first microphone 201 and the second microphone 202 is each frequency component of the sound emitted by the first sound source 401. Compared to the above, the variation in the phase difference is reduced. That is, when the first sound source 401 and the second sound source 402 emit sound, the number of frequency components in the phase difference area PA4 is larger than when only the first sound source 401 emits sound. Therefore, in the section from time t3 to t4, as shown in FIG. 8, compared to the section from time t2 to t3 when only the first sound source 401 is emitting sound, the phase difference component ratio R ( PA4) rises. Therefore, when the increase amount of the phase difference component ratio of the phase difference region specified in the sound source region exceeds the predetermined change amount, it can be determined that the number of sound sources in the sound source region has increased. In particular, the amount of utterance of a person in a noisy environment tends to increase as the amount of noise increases due to the Lombard effect. Therefore, for example, when the first sound source 401 is a television and the second sound source 402 is a person, the influence of the change in the volume of the first sound source 401 on the phase difference component ratio is small. Therefore, according to the sound source detection method according to the present embodiment, even when the second sound source 402 is generated in substantially the same direction as the first sound source 401 under the situation where the first sound source 401 emits sound, It is possible to detect that the second sound source 402 has occurred.

更に、時刻ｔ４〜ｔ５の区間のように第１の音源４０１及び第２の音源４０２が音を発している状態から第１の音源４０１のみが音を発している状態に変化した場合、図８に示したように、音源領域（位相差領域ＰＡ４）の位相差成分比率Ｒ（ＰＡ４）が減少する。そのため、音源領域に特定された位相差領域の位相差成分比率の減少量が所定の変化量を超えた場合、音源領域の音源数が減少したと判定することができる。 Further, when the first sound source 401 and the second sound source 402 emit sound as in the period from time t4 to t5, the state changes from the state in which only the first sound source 401 emits sound. As shown in FIG. 5, the phase difference component ratio R (PA4) of the sound source region (phase difference region PA4) decreases. Therefore, when the amount of decrease in the phase difference component ratio of the phase difference region specified in the sound source region exceeds a predetermined amount of change, it can be determined that the number of sound sources in the sound source region has decreased.

なお、図８に示した例では、例えば、時刻ｔ２〜ｔ３の区間や時刻ｔ４〜ｔ５の区間にも、音源領域である位相差領域ＰＡ４の位相差成分比率に短期間の比較的大きな変動が見られる。このような短期間の変動を音源数の増減に伴う変動と区別するため、例えば、音源領域である位相差領域ＰＡ４と隣接した位相差領域ＰＡ５における位相差成分比率Ｒ（ＰＡ５）の時間変化も考慮して音源数の増減を判定してもよい。位相差領域ＰＡ５は、音源４０１及び４０２から近いため、位相差領域ＰＡ５の位相差成分比率Ｒ（ＰＡ５）は、他の非音源領域の位相差成分比率よりも大きくなる。また、位相差領域ＰＡ５は実際には音源が存在しない位相差領域である。そのため、位相差領域ＰＡ４に新たな音源が発生すると、位相差領域ＰＡ４の位相差成分比率Ｒ（ＰＡ４）の上昇と対応して、位相差領域ＰＡ５の位相差成分比率Ｒ（ＰＡ５）が低下する。逆に、位相差領域ＰＡ４の音源数が減少すると、位相差領域ＰＡ４の位相差成分比率Ｒ（ＰＡ４）の低下と対応して、位相差領域ＰＡ５の位相差成分比率Ｒ（ＰＡ５）が上昇する。そのため、音源領域である位相差領域ＰＡ４の位相差成分比率の変動方向とは反対の方向に位相差領域ＰＡ５の位相差成分比率が変動した場合に、音源数が変化したと判定することで、判定精度を高めることが可能となる。 In the example shown in FIG. 8, for example, the phase difference component ratio of the phase difference area PA4, which is the sound source area, has a relatively large short-term fluctuation in the period from time t2 to t3 and from time t4 to t5. It can be seen. In order to distinguish such short-term fluctuations from fluctuations associated with the increase or decrease in the number of sound sources, for example, the temporal change of the phase difference component ratio R (PA5) in the phase difference area PA5 adjacent to the phase difference area PA4 that is the sound source area is also included. The increase / decrease in the number of sound sources may be determined in consideration. Since the phase difference area PA5 is close to the sound sources 401 and 402, the phase difference component ratio R (PA5) of the phase difference area PA5 is larger than the phase difference component ratios of the other non-sound source areas. The phase difference area PA5 is a phase difference area where no sound source actually exists. Therefore, when a new sound source is generated in the phase difference area PA4, the phase difference component ratio R (PA5) of the phase difference area PA5 decreases corresponding to the increase of the phase difference component ratio R (PA4) of the phase difference area PA4. . Conversely, when the number of sound sources in the phase difference area PA4 decreases, the phase difference component ratio R (PA5) in the phase difference area PA5 increases corresponding to the decrease in the phase difference component ratio R (PA4) in the phase difference area PA4. . Therefore, by determining that the number of sound sources has changed when the phase difference component ratio of the phase difference region PA5 changes in the direction opposite to the direction of change of the phase difference component ratio of the phase difference region PA4 that is the sound source region, The determination accuracy can be increased.

更に、位相差成分比率の時間変化に基づいて音源数の変化の有無を判定する場合、例えば、図８に示したような成分比率の閾値ＴＨｐ１，ＴＨｐ２に基づいて判定してもよい。図８には、成分比率についての２つの閾値ＴＨｐ１，ＴＨｐ２を設定して成分比率を３つの区域に分割している。ここでは、成分比率が第１の閾値ＴＨｐ１以上の区域を高率区域、第２の閾値ＴＨｐ２以上第１の閾値ＴＨｐ１未満の区域を中率区域、第２の閾値ＴＨｐ２以下の区域を低率区域とする。この３つの区域を用いると、図８に示した各位相差領域の位相差成分比率の時間変化は、図９のように表すことができる。 Further, when determining the presence or absence of a change in the number of sound sources based on the temporal change in the phase difference component ratio, for example, the determination may be made on the basis of the component ratio thresholds THp1 and THp2 as shown in FIG. In FIG. 8, two threshold values THp1 and THp2 for the component ratio are set to divide the component ratio into three sections. Here, an area where the component ratio is equal to or higher than the first threshold value THp1 is a high-rate area, an area where the second threshold value THp2 is less than the first threshold value THp1 is an intermediate ratio area, and an area where the component ratio is equal to or lower than the second threshold value THp2 is a low-rate area And When these three areas are used, the time change of the phase difference component ratio of each phase difference region shown in FIG. 8 can be expressed as shown in FIG.

図９は、成分比率の区域の時間変化に基づく音源数の変化の判定方法を説明する図である。 FIG. 9 is a diagram for explaining a method for determining a change in the number of sound sources based on a time change in a component ratio area.

図８に示した位相差成分比率の時間変化において、位相差領域ＰＡ１〜ＰＡ３、ＰＡ６、及びＰＡ７の位相差成分比率は、時刻ｔ２〜ｔ５の全区間において低率区域である。そのため、音源が存在しない位相差領域ＰＡ１〜ＰＡ３、ＰＡ６、及びＰＡ７では、図９に示したように、各フレームにおける位相差成分比率の区域が全て「低（低率区域）」となる。これに対し、音源が存在する位相差領域ＰＡ４では、時刻ｔ２から時刻ｔ３の直前までは「中（中率区域）」であり、時刻ｔ３において「高（高率区域）」に変化する。また、位相差領域ＰＡ４と隣接する位相差領域ＰＡ５では、時刻ｔ２から時刻ｔ３の直前までは「中」であり、時刻ｔ３において「低」に変化する。そのため、位相差成分比率の区域が変化し、かつ変化後に同一区域が所定のフレーム数（Ｌフレーム）以上続いているか否かにより、位相差領域ＰＡ４の音源数が増加したか否かを判定することが可能である。 In the time variation of the phase difference component ratio shown in FIG. 8, the phase difference component ratios of the phase difference regions PA1 to PA3, PA6, and PA7 are low rate areas in all the intervals from time t2 to t5. Therefore, in the phase difference areas PA1 to PA3, PA6, and PA7 where no sound source exists, the phase difference component ratio areas in each frame are all “low (low rate area)” as shown in FIG. On the other hand, in the phase difference area PA4 where the sound source is present, it is “middle (medium rate area)” from time t2 to immediately before time t3, and changes to “high (high rate area)” at time t3. Further, in the phase difference area PA5 adjacent to the phase difference area PA4, it is “middle” from time t2 to immediately before time t3, and changes to “low” at time t3. Therefore, it is determined whether or not the number of sound sources in the phase difference area PA4 has increased according to whether or not the area of the phase difference component ratio has changed and whether the same area has continued for a predetermined number of frames (L frames) or more after the change. It is possible.

また、図９に示したように、位相差領域ＰＡ４の位相差成分比率の区域は、時刻ｔ３で「中」から「高」に変化した後、時刻ｔ４に「高」から「中」に変化している。この場合も、位相差成分比率の区域が減少する方向に変化し、かつ変化後に同一区域が所定のフレーム数（Ｌフレーム）以上続いているか否かにより、位相差領域ＰＡ４の音源数が減少したか否かを判定することが可能である。 Further, as shown in FIG. 9, the phase difference component ratio area of the phase difference area PA4 changes from “middle” to “high” at time t3, and then changes from “high” to “middle” at time t4. is doing. Also in this case, the number of sound sources in the phase difference area PA4 is decreased depending on whether or not the area of the phase difference component ratio changes in a decreasing direction and the same area continues for a predetermined number of frames (L frames) or more after the change. It is possible to determine whether or not.

なお、位相差成分比率を複数の区域に分割する際の閾値ＴＨｐ１及びＴＨｐ２は任意であり、第１の音源４０１のみが音を発している状態での音源領域の位相差成分比率の変動幅等に基づいて設定することが可能である。 Note that the thresholds THp1 and THp2 when the phase difference component ratio is divided into a plurality of sections are arbitrary, and the fluctuation range of the phase difference component ratio in the sound source region when only the first sound source 401 emits sound. It is possible to set based on

図１０は、検出結果の例を示す図である。
本実施形態の音源検出処理では、上記のように、各位相差領域の位相差成分比率の時間変化に基づいて、音源領域における音源数の変化を検出する。そのため、検出結果出力部１０５は、例えば、図１０に示すような検出結果を出力することが可能である。図１０に示した例では、音源検出処理の開始直後のフレーム１からフレーム８７９までは音源数が１、音源領域が位相差領域ＰＡ４となっている。そして、フレーム８８０と対応する時刻に位相差領域ＰＡ４に新たな音源が発生すると、音源検出装置１は、ステップＳ９の処理により音源数が増加したと判定する。そのため、フレーム８８０からフレーム９９９までは音源数が２、２個の音源の音源領域がいずれも位相差領域ＰＡ４となる。また、フレーム１０００と対応する時刻に位相差領域ＰＡ４に存在する音源の１つが音の発生をやめると、音源検出装置１は、音源数が減少したと判定する。そのため、フレーム１０００からフレーム１２５３までは、再び音源数が１、音源領域が位相差領域ＰＡ４となる。 FIG. 10 is a diagram illustrating an example of a detection result.
In the sound source detection process of the present embodiment, as described above, a change in the number of sound sources in the sound source region is detected based on a time change in the phase difference component ratio of each phase difference region. Therefore, the detection result output unit 105 can output a detection result as shown in FIG. 10, for example. In the example shown in FIG. 10, the number of sound sources is 1 and the sound source region is the phase difference region PA4 from frame 1 to frame 879 immediately after the start of the sound source detection process. When a new sound source is generated in the phase difference area PA4 at a time corresponding to the frame 880, the sound source detection device 1 determines that the number of sound sources has increased by the process of step S9. Therefore, from frame 880 to frame 999, the sound source areas of two or two sound sources are both phase difference areas PA4. When one of the sound sources existing in the phase difference area PA4 stops generating sound at the time corresponding to the frame 1000, the sound source detection device 1 determines that the number of sound sources has decreased. Therefore, from the frame 1000 to the frame 1253, the number of sound sources is 1 again, and the sound source region is the phase difference region PA4.

その後、例えば、フレーム１２５４と対応する時刻で、位相差領域ＰＡ４の音源に加えて位相差領域ＰＡ２の音源が音を発するようになり、ステップＳ６において抽出される上位２個の位相差領域が位相差領域ＰＡ４及び位相差領域ＰＡ２になったとする。この場合、位相差領域ＰＡ４と位相差領域ＰＡ２とは隣接していないので（ステップＳ７；Ｎｏ）、検出結果出力部１０５には、複数の音源が存在するという検出結果が通知される（ステップＳ８）。よって、フレーム１２５４以降の検出結果は、音源数が２、２個の音源領域が位相差領域ＰＡ４及び位相差領域ＰＡ２になる。 Thereafter, for example, at the time corresponding to the frame 1254, the sound source in the phase difference area PA2 starts to emit sound in addition to the sound source in the phase difference area PA4, and the top two phase difference areas extracted in step S6 are positioned. It is assumed that the phase difference area PA4 and the phase difference area PA2 are reached. In this case, since the phase difference area PA4 and the phase difference area PA2 are not adjacent to each other (step S7; No), the detection result output unit 105 is notified of the detection result that there are a plurality of sound sources (step S8). ). Therefore, in the detection results after frame 1254, the number of sound sources is 2, and the two sound source regions are the phase difference region PA4 and the phase difference region PA2.

このように、本実施形態によれば、定常的に音を発している第１の音源とは異なる方向に存在する他の音源を検出することはもちろん、第１の音源と略同一方向に存在する他の音源を検出することも可能である。 As described above, according to the present embodiment, it is possible to detect other sound sources that exist in a different direction from the first sound source that emits sound constantly, and of course exist in substantially the same direction as the first sound source. It is also possible to detect other sound sources.

なお、図２Ａ及び図２Ｂのフローチャートは、音源検出処理の一例に過ぎず、例えば、ステップＳ６において抽出する位相差領域を１個にし、ステップＳ７及びステップＳ８を省略してもよい。また、抽出した位相差領域が隣接していない場合にも、抽出した各位相差領域（音源領域）に対し音源領域／音源数決定処理を行ってもよい。 2A and 2B is merely an example of a sound source detection process. For example, one phase difference region may be extracted in step S6, and steps S7 and S8 may be omitted. Even when the extracted phase difference areas are not adjacent to each other, the sound source area / number of sound sources may be determined for each extracted phase difference area (sound source area).

また、本実施形態に係る音源検出装置１は、第１のマイク２０１及び第２のマイク２０２の２個のマイクからの音声信号だけでなく、更に多数のマイクからの音声信号を用いて音源を検出するものであってもよい。 In addition, the sound source detection device 1 according to the present embodiment generates a sound source using not only audio signals from two microphones, the first microphone 201 and the second microphone 202, but also audio signals from a number of microphones. It may be detected.

［第２の実施形態］
図１１は、第２の実施形態に係る音源検出装置の機能的構成を示す図である。 [Second Embodiment]
FIG. 11 is a diagram illustrating a functional configuration of the sound source detection device according to the second embodiment.

図１１に示すように、本実施形態に係る音源検出装置１は、音声信号受付部１０１と、変換部１０２と、位相差算出部１０３と、音源領域／音源数検出部１０４と、検出結果出力部１０５と、を備える。 As shown in FIG. 11, the sound source detection device 1 according to the present embodiment includes an audio signal reception unit 101, a conversion unit 102, a phase difference calculation unit 103, a sound source region / sound source number detection unit 104, and a detection result output. Unit 105.

音声信号受付部１０１は、第１のマイク２０１及び第２のマイク２０２を含むマイクアレイからの音声信号の入力を受け付ける。また、音声信号受付部１０１は、第１のマイク２０１及び第２のマイク２０２から入力された時間領域の音声信号を、それぞれ、音源検出処理の処理単位（フレーム）に分割する。 The audio signal reception unit 101 receives an input of an audio signal from a microphone array including the first microphone 201 and the second microphone 202. The audio signal reception unit 101 divides the time-domain audio signals input from the first microphone 201 and the second microphone 202 into processing units (frames) for sound source detection processing.

変換部１０２は、第１のマイク２０１及び第２のマイク２０２から入力された時間領域の音声信号をフレーム毎に周波数スペクトルに変換する。周波数スペクトルは、振幅スペクトルと位相スペクトルとを含む。 The conversion unit 102 converts the time-domain audio signal input from the first microphone 201 and the second microphone 202 into a frequency spectrum for each frame. The frequency spectrum includes an amplitude spectrum and a phase spectrum.

位相差算出部１０３は、第１の周波数スペクトルにおける位相スペクトルと、第２の周波数スペクトルにおける位相スペクトルと、に基づいて、１フレーム分の周波数スペクトルにおける各周波数成分の位相差を算出する。第１の周波数スペクトルは第１のマイク２０１から入力された音声信号の周波数スペクトルであり、第２の周波数スペクトルは第２のマイク２０２から入力された音声信号の周波数スペクトルである。本実施形態の音源検出装置における位相差算出部１０３は、定常雑音推定部１０３ａを含む。定常雑音推定部１０３ａは、１フレーム分の周波数スペクトルにおける各周波数成分の定常雑音を推定する。本実施形態に係る位相差算出部１０３は、１フレーム分の周波数スペクトルにおける全周波数成分のうち、振幅スペクトルが推定した定常雑音よりも大きい周波数成分の位相差のみを算出する。 The phase difference calculation unit 103 calculates the phase difference of each frequency component in the frequency spectrum for one frame based on the phase spectrum in the first frequency spectrum and the phase spectrum in the second frequency spectrum. The first frequency spectrum is the frequency spectrum of the audio signal input from the first microphone 201, and the second frequency spectrum is the frequency spectrum of the audio signal input from the second microphone 202. The phase difference calculation unit 103 in the sound source detection device of the present embodiment includes a stationary noise estimation unit 103a. The stationary noise estimation unit 103a estimates the stationary noise of each frequency component in the frequency spectrum for one frame. The phase difference calculation unit 103 according to the present embodiment calculates only the phase difference of the frequency component larger than the stationary noise whose amplitude spectrum is estimated among all the frequency components in the frequency spectrum for one frame.

音源領域／音源数検出部１０４は、位相差算出部１０３で算出した各周波数成分の位相差に基づいて、音源が存在する位相差領域及び音源の数を決定する。音源領域／音源数検出部１０４は、周波数成分数算出部１０４ａと、位相差成分比率算出部１０４ｂと、成分比率遷移情報算出部１０４ｃと、成分比率保持部１０４ｄと、を含む。本実施形態に係る音源検出装置１における音源領域／音源数検出部１０４の各部１０４ａ〜１０４ｄは、それぞれ、第１の実施形態に係る音源領域／音源数検出部１０４の各部１０４ａ〜１０４ｄと同等の機能を有する。 The sound source region / sound source number detection unit 104 determines the phase difference region where the sound source exists and the number of sound sources based on the phase difference of each frequency component calculated by the phase difference calculation unit 103. The sound source region / sound source number detection unit 104 includes a frequency component number calculation unit 104a, a phase difference component ratio calculation unit 104b, a component ratio transition information calculation unit 104c, and a component ratio holding unit 104d. Each unit 104a to 104d of the sound source region / sound source number detection unit 104 in the sound source detection device 1 according to the present embodiment is equivalent to each unit 104a to 104d of the sound source region / sound source number detection unit 104 according to the first embodiment. It has a function.

検出結果出力部１０５は、音源領域／音源数検出部１０４の処理結果を外部装置等に出力する。 The detection result output unit 105 outputs the processing result of the sound source region / sound source number detection unit 104 to an external device or the like.

本実施形態の音源検出装置１は、図２Ａ及び図２Ｂに示したフローチャートに沿った音源検出処理を行う。この際、音源検出装置１の位相差算出部１０３は、位相スペクトル差を算出する処理（ステップＳ４）として、図１２に示す処理を行う。 The sound source detection device 1 of the present embodiment performs sound source detection processing according to the flowcharts shown in FIGS. 2A and 2B. At this time, the phase difference calculation unit 103 of the sound source detection device 1 performs the process shown in FIG. 12 as the process of calculating the phase spectrum difference (step S4).

図１２は、第２の実施形態に係る位相スペクトル差を算出する処理の内容を説明するフローチャートである。 FIG. 12 is a flowchart for explaining the content of the process of calculating the phase spectrum difference according to the second embodiment.

本実施形態に係る音源検出処理における位相スペクトル差を算出する処理では、位相差算出部１０３は、まず、周波数成分の識別に用いる変数ｉを０に初期化する（ステップＳ４０１）。 In the process of calculating the phase spectrum difference in the sound source detection process according to the present embodiment, the phase difference calculation unit 103 first initializes a variable i used for identifying a frequency component to 0 (step S401).

次に、位相差算出部１０３は、周波数成分ｆ（ｉ）の定常雑音の振幅レベルを推定する（ステップＳ４０２）。ステップＳ４０２の処理は、定常雑音推定部１０３ａが行う。定常雑音推定部１０３ａは、既知の定常雑音推定方法に基づいて定常雑音の振幅レベルを推定する。 Next, the phase difference calculation unit 103 estimates the amplitude level of stationary noise of the frequency component f (i) (step S402). The stationary noise estimation unit 103a performs the process of step S402. The stationary noise estimation unit 103a estimates the amplitude level of stationary noise based on a known stationary noise estimation method.

次に、位相差算出部１０３は、第１の周波数スペクトルにおける周波数成分ｆ（ｉ）の振幅スペクトルと、推定した定常雑音の振幅レベルとを比較し（ステップＳ４０３）、振幅スペクトルのほうが大きいか否かを判定する（ステップＳ４０４）。 Next, the phase difference calculation unit 103 compares the amplitude spectrum of the frequency component f (i) in the first frequency spectrum with the estimated amplitude level of stationary noise (step S403), and determines whether the amplitude spectrum is larger. Is determined (step S404).

振幅スペクトルのほうが大きい場合（ステップＳ４０４；Ｙｅｓ）、位相差算出部１０３は、周波数成分ｆ（ｉ）の位相スペクトル差を算出する（ステップＳ４０５）。その後、位相差算出部１０３は、振幅スペクトルと、推定した定常雑音の振幅レベルとの大小関係を未判定の周波数成分ｆ（ｉ）があるか否かを判定する（ステップＳ４０６）。一方、振幅スペクトルが定常雑音の振幅レベル以下である場合（ステップＳ４０４；Ｎｏ）、位相差算出部１０３は、ステップＳ４０５の処理を省略し、ステップＳ４０６の判定を行う。 When the amplitude spectrum is larger (step S404; Yes), the phase difference calculation unit 103 calculates the phase spectrum difference of the frequency component f (i) (step S405). Thereafter, the phase difference calculation unit 103 determines whether or not there is a frequency component f (i) whose magnitude relationship between the amplitude spectrum and the estimated amplitude level of stationary noise has not been determined (step S406). On the other hand, when the amplitude spectrum is equal to or lower than the amplitude level of stationary noise (step S404; No), the phase difference calculation unit 103 omits the process of step S405 and performs the determination of step S406.

振幅スペクトルと、推定した定常雑音の振幅レベルとの大小関係を未判定の周波数成分ｆ（ｉ）がある場合（ステップＳ４０６，Ｙｅｓ）、位相差算出部１０３は、変数ｉをｉ＋１に更新し（ステップＳ４０７）、ステップＳ４０２の処理に戻る。そして、全ての周波数成分で振幅スペクトルと、定常雑音の振幅レベルとの大小関係の判定が行われると（ステップＳ４０６；Ｎｏ）、位相差算出部１０３は、位相スペクトル差を算出した周波数成分及び位相スペクトル差を出力する（ステップＳ４０８）。 When there is a frequency component f (i) whose magnitude relationship between the amplitude spectrum and the estimated steady-state amplitude level is undetermined (Yes in step S406), the phase difference calculation unit 103 updates the variable i to i + 1 ( Step S407) and the process returns to step S402. When the magnitude relationship between the amplitude spectrum and the stationary noise amplitude level is determined for all frequency components (step S406; No), the phase difference calculation unit 103 calculates the frequency component and phase for which the phase spectrum difference has been calculated. The spectral difference is output (step S408).

このように、本実施形態に係る音源検出処理では、位相スペクトル差を算出する処理（ステップＳ４）において、振幅スペクトルが定常雑音よりも大きい周波数成分の位相スペクトル差のみを算出する。このため、音源検出処理における以後の処理では、振幅スペクトルが定常雑音よりも大きい周波数成分の位相スペクトル差のみに基づいて位相差成分比率を算出し、音源数の変化の有無等を判定する。よって、本実施形態によれば、検出対象となる音源の音において定常雑音に埋もれている周波数成分を除外して、音源領域及び音源数を検出することが可能となり、音源領域及び音源数の検出精度が向上する。また、検出対象となる音源の音において定常雑音に埋もれている周波数成分を除外することにより、位相スペクトル差の算出処理、位相差成分比率の算出処理等における演算処理数を低減させることが可能となる。従って、本実施形態によれば、音源領域及び音源数の検出精度を低下させることなく、音源検出処理の処理負荷を軽減することが可能となる。 As described above, in the sound source detection process according to the present embodiment, only the phase spectrum difference of the frequency component whose amplitude spectrum is larger than the stationary noise is calculated in the process of calculating the phase spectrum difference (step S4). Therefore, in the subsequent processing in the sound source detection processing, the phase difference component ratio is calculated based only on the phase spectrum difference of the frequency component whose amplitude spectrum is larger than the stationary noise, and the presence / absence of the change in the number of sound sources is determined. Therefore, according to the present embodiment, it is possible to detect the sound source region and the number of sound sources by excluding the frequency components buried in the stationary noise in the sound of the sound source to be detected. Accuracy is improved. In addition, by excluding frequency components buried in stationary noise in the sound of the sound source to be detected, it is possible to reduce the number of calculation processes in the phase spectrum difference calculation process, the phase difference component ratio calculation process, etc. Become. Therefore, according to the present embodiment, it is possible to reduce the processing load of the sound source detection process without reducing the detection accuracy of the sound source region and the number of sound sources.

［第３の実施形態］
図１３は、第３の実施形態に係る音源検出装置の機能的構成を示す図である。 [Third Embodiment]
FIG. 13 is a diagram illustrating a functional configuration of a sound source detection device according to the third embodiment.

図１３に示すように、本実施形態に係る音源検出装置１は、音声信号受付部１０１と、変換部１０２と、位相差算出部１０３と、音源領域／音源数検出部１０４と、検出結果出力部１０５と、を備える。 As illustrated in FIG. 13, the sound source detection device 1 according to the present embodiment includes an audio signal reception unit 101, a conversion unit 102, a phase difference calculation unit 103, a sound source region / sound source number detection unit 104, and a detection result output. Unit 105.

位相差算出部１０３は、第１の周波数スペクトルにおける位相スペクトルと、第２の周波数スペクトルにおける位相スペクトルとに基づいて、１フレーム分の周波数スペクトルにおける各周波数成分の位相スペクトル差（位相差）を算出する。第１の周波数スペクトルは第１のマイク２０１から入力された音声信号の周波数スペクトルであり、第２の周波数スペクトルは第２のマイク２０２から入力された音声信号の周波数スペクトルである。位相差算出部１０３は、１フレーム分の周波数スペクトルにおける全周波数成分の位相差を算出してもよいし、振幅スペクトルが推定した定常雑音の振幅レベルよりも大きい周波数成分の位相差のみを算出してもよい。 The phase difference calculation unit 103 calculates the phase spectrum difference (phase difference) of each frequency component in the frequency spectrum for one frame based on the phase spectrum in the first frequency spectrum and the phase spectrum in the second frequency spectrum. To do. The first frequency spectrum is the frequency spectrum of the audio signal input from the first microphone 201, and the second frequency spectrum is the frequency spectrum of the audio signal input from the second microphone 202. The phase difference calculation unit 103 may calculate the phase difference of all frequency components in the frequency spectrum for one frame, or calculates only the phase difference of frequency components larger than the amplitude level of stationary noise estimated by the amplitude spectrum. May be.

音源領域／音源数検出部１０４は、位相差算出部１０３で算出した各周波数成分の位相差に基づいて、音源が存在する位相差領域及び音源の数を決定する。図１３では省略しているが、音源領域／音源数検出部１０４は、周波数成分数算出部１０４ａと、位相差成分比率算出部１０４ｂと、成分比率遷移情報算出部１０４ｃと、成分比率保持部１０４ｄと、を含む。 The sound source region / sound source number detection unit 104 determines the phase difference region where the sound source exists and the number of sound sources based on the phase difference of each frequency component calculated by the phase difference calculation unit 103. Although omitted in FIG. 13, the sound source region / sound source number detection unit 104 includes a frequency component number calculation unit 104a, a phase difference component ratio calculation unit 104b, a component ratio transition information calculation unit 104c, and a component ratio holding unit 104d. And including.

検出結果出力部１０５は、音源領域／音源数検出部１０４の処理結果に基づいて、音源検出処理の結果を外部装置に出力する。本実施形態の音源検出装置１における検出結果出力部１０５は、継続時間計測部１０５ａと、異常状態検出部１０５ｂと、注意信号出力部１０５ｃと、検出音源情報保持部１０５ｄと、を含む。 The detection result output unit 105 outputs the result of the sound source detection process to the external device based on the processing result of the sound source region / sound source number detection unit 104. The detection result output unit 105 in the sound source detection device 1 of the present embodiment includes a duration measurement unit 105a, an abnormal state detection unit 105b, an attention signal output unit 105c, and a detected sound source information holding unit 105d.

継続時間計測部１０５ａは、音源が検出されない状態の継続時間（音源の不検出継続時間）を計測する。 The duration measuring unit 105a measures a duration in a state where no sound source is detected (sound source non-detection duration).

異常状態検出部１０５ｂは、音源の不検出継続時間に基づいて、音源を検出する空間に生じた異常状態を検出する。 The abnormal state detection unit 105b detects an abnormal state generated in a space for detecting a sound source based on the non-detection duration of the sound source.

注意信号出力部１０５ｃは、異常状態検出部１０５ｂで異常状態を検出した場合に、異常状態であることを通知する注意信号を外部装置に出力する。 The caution signal output unit 105c outputs a caution signal notifying the abnormal state to the external device when the abnormal state detection unit 105b detects the abnormal state.

検出音源情報保持部１０５ｄは、検出された音源領域及び音源数についての情報、並びに継続期間を計測中であるか否かを示す情報等を保持（記憶）する。 The detected sound source information holding unit 105d holds (stores) information about the detected sound source region and the number of sound sources, information indicating whether the duration is being measured, and the like.

本実施形態の音源検出装置１は、図２Ａ及び図２Ｂに示したフローチャートに沿った音源検出処理を行う。この際、音源検出装置１の検出結果出力部１０５は、検出結果を出力する処理（ステップＳ１０）として、例えば、図１４に示す処理を行う。 The sound source detection device 1 of the present embodiment performs sound source detection processing according to the flowcharts shown in FIGS. 2A and 2B. At this time, the detection result output unit 105 of the sound source detection device 1 performs, for example, the process illustrated in FIG. 14 as the process of outputting the detection result (step S10).

図１４は、第３の実施形態に係る検出結果を出力する処理の内容を説明するフローチャートである。 FIG. 14 is a flowchart for explaining the contents of a process for outputting a detection result according to the third embodiment.

本実施形態に係る検出結果を出力する処理において、検出結果出力部１０５は、まず、音源領域／音源数検出部１０４から取得した音源領域及び音源数の検出結果を保持する（ステップＳ１００１）。次に、検出結果出力部１０５は、音源領域の検出結果に基づいて、音源を検出したか否かを判定する（ステップＳ１００２）。ステップＳ１００２の判定は、異常状態検出部１０５ｂが行う。 In the process of outputting the detection result according to the present embodiment, the detection result output unit 105 first holds the detection result of the sound source region and the number of sound sources acquired from the sound source region / sound source number detection unit 104 (step S1001). Next, the detection result output unit 105 determines whether or not a sound source has been detected based on the detection result of the sound source region (step S1002). The determination in step S1002 is performed by the abnormal state detection unit 105b.

音源が検出されなかった場合（ステップＳ１００２；Ｎｏ）、異常状態検出部１０５ｂは、次に、音源の不検出継続時間を計測中であるか否かを判定する（ステップＳ１００３）。ステップＳ１００３の判定では、異常状態検出部１０５ｂは、継続時間計測部１０５ａにおいて音源の不検出継続時間を計測しているか否かをチェックする。不検出継続時間を計測中ではない場合（ステップＳ１００３；Ｎｏ）、異常状態検出部１０５ｂは、継続時間計測部１０５ａに不検出継続時間の計測を開始させる（ステップＳ１００４）。ステップＳ１００４の処理が行われると、検出結果出力部１０５は、検出結果を出力する処理を終了する（リターン）。 When no sound source is detected (step S1002; No), the abnormal state detection unit 105b next determines whether or not the sound source non-detection duration is being measured (step S1003). In the determination in step S1003, the abnormal state detection unit 105b checks whether or not the non-detection duration of the sound source is measured in the duration measurement unit 105a. When the non-detection duration is not being measured (step S1003; No), the abnormal state detection unit 105b causes the duration measurement unit 105a to start measuring the non-detection duration (step S1004). When the process of step S1004 is performed, the detection result output unit 105 ends the process of outputting the detection result (return).

また、音源が検出されず、かつ不検出継続時間を計測中である場合（ステップＳ１００３；Ｙｅｓ）、異常状態検出部１０５ｂは、次に、不検出継続時間が時間閾値ＴＨ以上であるか否かを判定する（ステップＳ１００５）。不検出継続時間が時間閾値ＴＨ以上である場合（ステップＳ１００５；Ｙｅｓ）、異常状態検出部１０５ｂは、注意信号出力部１０５ｃに注意信号を出力させる（ステップ１００６）。ステップＳ１００６の処理において、注意信号出力部１０５ｃは、例えば、予め登録してある電話番号に発呼する処理、或いは予め登録してあるメールアドレス宛の電子メールを送信する処理を行う。ステップＳ１００６の処理が行われると、検出結果出力部１０５は、検出結果を出力する処理を終了する。 If the sound source is not detected and the non-detection duration is being measured (step S1003; Yes), the abnormal state detection unit 105b then determines whether or not the non-detection duration is equal to or greater than the time threshold TH. Is determined (step S1005). When the non-detection duration is equal to or greater than the time threshold TH (step S1005; Yes), the abnormal state detection unit 105b causes the attention signal output unit 105c to output a warning signal (step 1006). In step S1006, the attention signal output unit 105c performs, for example, a process of calling a pre-registered telephone number or a process of transmitting an e-mail addressed to a pre-registered mail address. When the process of step S1006 is performed, the detection result output unit 105 ends the process of outputting the detection result.

一方、不検出継続時間が時間閾値ＴＨよりも短い場合（ステップＳ１００５；Ｎｏ）、異常状態検出部１０５ｂは、異常が検出されなかったと判定する。この場合、検出結果出力部１０５は、ステップＳ１００６を省略して検出結果を出力する処理を終了する。 On the other hand, when the non-detection duration is shorter than the time threshold value TH (step S1005; No), the abnormal state detection unit 105b determines that no abnormality has been detected. In this case, the detection result output unit 105 ends the process of omitting step S1006 and outputting the detection result.

ステップＳ１００２において音源を検出したと判定した場合（ステップＳ１００２；Ｙｅｓ）も、異常状態検出部１０５ｂは、次に、不検出継続時間を計測中であるか否かを判定する（ステップＳ１００７）。ステップＳ１００７の判定においても、異常状態検出部１０５ｂは、継続時間計測部１０５ａが音源の不検出継続時間を計測しているか否かをチェックする。不検出継続時間を計測中である場合（ステップＳ１００７；Ｙｅｓ）、異常状態検出部１０５ｂは、継続時間計測部１０５ａに音源の不検出継続時間の計測を終了させる（ステップＳ１００８）。ステップＳ１００８の処理が行われると、検出結果出力部１０５は、検出結果を出力する処理を終了する。また、不検出継続時間を計測中ではない場合（ステップＳ１００７；Ｎｏ）、検出結果出力部１０５は、ステップＳ１００８の処理を省略して検出結果を出力する処理を終了する。 Even when it is determined in step S1002 that a sound source has been detected (step S1002; Yes), the abnormal state detection unit 105b next determines whether or not the non-detection duration is being measured (step S1007). Also in the determination of step S1007, the abnormal state detection unit 105b checks whether or not the duration measurement unit 105a measures the non-detection duration of the sound source. When the non-detection duration is being measured (step S1007; Yes), the abnormal state detection unit 105b causes the duration measurement unit 105a to finish measuring the non-detection duration of the sound source (step S1008). When the process of step S1008 is performed, the detection result output unit 105 ends the process of outputting the detection result. If the non-detection duration is not being measured (step S1007; No), the detection result output unit 105 ends the process of outputting the detection result by omitting the process of step S1008.

例えば、音源を検出する空間が住宅内等である場合、空間内にいる人物や各種装置等が音を発することにより、人物や装置等が音源として検出される。そのため、住宅内で長時間音源が検出されない状態は、空間内で生活する人物や各種装置等が長時間音を発していない状態であることを意味する。住宅内で長時間音源が検出されない理由としては、当該住宅で生活している人物が不在（留守）であることの他、当該住宅内にいる人物が音を発することのできない状態になっていることも考えられる。 For example, when a space for detecting a sound source is in a house or the like, a person or device in the space emits sound, so that the person or device is detected as a sound source. Therefore, a state where a sound source is not detected for a long time in a house means that a person or various devices living in the space does not emit a sound for a long time. The reason why the sound source is not detected for a long time in the house is that the person living in the house is absent (absence) and that the person in the house cannot emit sound. It is also possible.

本実施形態に係る音源検出処理では、検出結果を出力する処理（ステップＳ１０）において、音源が検出されない状態の継続時間（音源の不検出継続時間）を計測し、不検出継続時間が時間閾値以上になると、異常が発生したことを検出する。また、本実施形態に係る検出結果出力部１０５は、異常が発生した場合に、予め登録してある電話番号に発呼する処理、或いは予め登録してあるメールアドレス宛のメールを送信する処理等を行う。そのため、本実施形態によれば、音源を検出する空間から遠隔の地にいる者に対し、音源が長時間検出されないという異常状態の発生を通知することが可能である。 In the sound source detection process according to the present embodiment, in the process of outputting the detection result (step S10), the duration of the state in which no sound source is detected (sound source non-detection duration) is measured, and the non-detection duration is equal to or greater than the time threshold. Then, it detects that an abnormality has occurred. In addition, the detection result output unit 105 according to the present embodiment performs a process of calling a pre-registered telephone number or a process of transmitting a mail addressed to a pre-registered mail address when an abnormality occurs. I do. Therefore, according to the present embodiment, it is possible to notify a person who is remote from the space where the sound source is detected, of the occurrence of an abnormal state in which the sound source is not detected for a long time.

なお、図１４のフローチャートは検出結果を出力する処理の一例に過ぎず、処理の内容は適宜変更可能である。例えば、継続時間計測部１０５ａが音源の不検出継続時間に加えて音源を検出している状態の継続時間（音源の検出継続時間）を計測可能である場合、検出結果出力部１０５が行う検出結果を出力する処理は、図１５Ａ及び図１５Ｂに示す処理であってもよい。 Note that the flowchart of FIG. 14 is merely an example of a process for outputting a detection result, and the content of the process can be changed as appropriate. For example, when the duration measurement unit 105a can measure the duration (sound source detection duration) in a state where the sound source is detected in addition to the sound source non-detection duration, the detection result output by the detection result output unit 105 The process shown in FIGS. 15A and 15B may be output.

図１５Ａは、検出結果を出力する処理の変形例を説明するフローチャート（その１）である。図１５Ｂは、検出結果を出力する処理の変形例を説明するフローチャート（その２）である。 FIG. 15A is a flowchart (part 1) illustrating a modification of the process of outputting the detection result. FIG. 15B is a flowchart (part 2) illustrating a modified example of the process of outputting the detection result.

検出結果を出力する処理の変形例において、検出結果出力部１０５は、図１５Ａに示すように、まず、音源領域／音源数検出部１０４から取得した音源領域及び音源数の検出結果を保持する（ステップＳ１０１１）。次に、検出結果出力部１０５は、音源領域の検出結果に基づいて、音源を検出したか否かを判定する（ステップＳ１０１２）。ステップＳ１０１２の判定は、異常状態検出部１０５ｂが行う。 In the modification of the process of outputting the detection result, the detection result output unit 105 first holds the detection result of the sound source region and the number of sound sources acquired from the sound source region / sound source number detection unit 104 as shown in FIG. 15A ( Step S1011). Next, the detection result output unit 105 determines whether or not a sound source has been detected based on the detection result of the sound source region (step S1012). The determination in step S1012 is performed by the abnormal state detection unit 105b.

音源が検出されなかった場合（ステップＳ１０１２；Ｎｏ）、異常状態検出部１０５ｂは、次に、継続時間計測部１０５ａにおいて音源の不検出継続時間を計測中であるか否かを判定する（ステップＳ１０１３）。不検出継続時間を計測中ではない場合（ステップＳ１０１３；Ｎｏ）、異常状態検出部１０５ｂは、異常が検出されなかったと判定する。この場合、検出結果出力部１０５は、次に、継続時間計測部１０５ａに不検出継続時間の計測を開始させる（ステップＳ１０１４）。ステップＳ１０１４の処理の後、異常状態検出部１０５ｂは、継続時間計測部１０５ａにおいて音源の検出継続時間を計測中であるか否かを判定する（ステップＳ１０１５）。検出継続時間を計測中である場合（ステップＳ１０１５；Ｙｅｓ）、異常状態検出部１０５ｂは、継続時間計測部１０５ａに検出継続時間の計測を終了させる（ステップＳ１０１６）。ステップＳ１０１６の処理が行われると、検出結果出力部１０５は、検出結果を出力する処理を終了する（リターン）。また、音源が検出されず、かつ検出継続時間を計測中ではない場合（ステップＳ１０１５；Ｎｏ）、検出結果出力部１０５は、ステップＳ１０１６の処理を省略して検出結果を出力する処理を終了する。 When no sound source is detected (step S1012; No), the abnormal state detection unit 105b next determines whether or not the non-detection duration of the sound source is being measured in the duration measurement unit 105a (step S1013). ). When the non-detection duration is not being measured (step S1013; No), the abnormal state detection unit 105b determines that no abnormality has been detected. In this case, the detection result output unit 105 next causes the duration measurement unit 105a to start measuring the non-detection duration (step S1014). After the process of step S1014, the abnormal state detection unit 105b determines whether the duration measurement unit 105a is measuring the detection duration of the sound source (step S1015). When the detection duration is being measured (step S1015; Yes), the abnormal state detection unit 105b causes the duration measurement unit 105a to finish measuring the detection duration (step S1016). When the process of step S1016 is performed, the detection result output unit 105 ends the process of outputting the detection result (return). If the sound source is not detected and the detection duration time is not being measured (step S1015; No), the detection result output unit 105 ends the process of outputting the detection result by omitting the process of step S1016.

一方、音源が検出されず、かつ不検出継続時間を計測中である場合（ステップＳ１０１３；Ｙｅｓ）、異常状態検出部１０５ｂは、次に、不検出継続時間が第１の時間閾値ＴＨ１以上であるか否かを判定する（ステップＳ１０１７）。不検出継続時間が第１の時間閾値ＴＨ１以上である場合（ステップＳ１０１７；Ｙｅｓ）、異常状態検出部１０５ｂは、注意信号出力部１０５ｃに第１の注意信号を出力させる（ステップＳ１０１８）。注意信号出力部１０５ｃは、第１の注意信号として、例えば、音源が検出されない状態が所定期間続いたことを示すメッセージを含む電子メールを、予め登録されたメールアドレスに宛てて送信する。また、不検出継続時間が第１の時間閾値ＴＨ１よりも短い場合（ステップＳ１０１７；Ｎｏ）、異常状態検出部１０５ｂは、異常が検出されなかったと判定する。この場合、検出結果出力部１０５は、ステップＳ１０１８の処理を省略して、検出結果を出力する処理を終了する。 On the other hand, when the sound source is not detected and the non-detection duration is being measured (step S1013; Yes), the abnormal state detection unit 105b then has a non-detection duration equal to or greater than the first time threshold value TH1. It is determined whether or not (step S1017). When the non-detection duration is equal to or longer than the first time threshold value TH1 (step S1017; Yes), the abnormal state detection unit 105b causes the attention signal output unit 105c to output the first attention signal (step S1018). The attention signal output unit 105c transmits, for example, an e-mail including a message indicating that a state in which no sound source is detected continues for a predetermined period as a first attention signal to a pre-registered mail address. When the non-detection duration is shorter than the first time threshold TH1 (step S1017; No), the abnormal state detection unit 105b determines that no abnormality has been detected. In this case, the detection result output unit 105 omits the process of step S1018 and ends the process of outputting the detection result.

ステップＳ１０１２において音源を検出したと判定した場合（ステップＳ１０１２；Ｙｅｓ）、異常状態検出部１０５ｂは、図１５Ｂに示すように、次に、検出継続時間を計測していない音源領域があるか否かを判定する（ステップＳ１０１９）。検出継続時間を計測していない音源領域がある場合（ステップＳ１０１９；Ｙｅｓ）、異常状態検出部１０５ｂは、次に、継続時間計測部１０５ａに検出継続時間を計測していない音源領域の検出継続時間の計測を開始させる（ステップＳ１０２０）。ステップＳ１０２０の後、異常状態検出部１０５ｂは、検出継続時間の計測を開始させた音源領域に対する不検出継続期間を計測中であるか否かを判定する（ステップＳ１０２１）。不検出継続時間を計測中の場合（ステップＳ１０２１；Ｙｅｓ）、異常状態検出部１０５は、継続時間計測部１０５ａに不検出継続時間の計測を終了させる（ステップＳ１０２２）。その後、異常状態検出部１０５ｂは、音源数が増加した状態である音源領域があるか否かを判定する（ステップＳ１０２３）。 If it is determined in step S1012 that a sound source has been detected (step S1012; Yes), as shown in FIG. 15B, the abnormal state detection unit 105b next determines whether there is a sound source region for which the detection duration is not measured. Is determined (step S1019). When there is a sound source region for which the detection duration is not measured (step S1019; Yes), the abnormal state detection unit 105b then detects the detection duration of the sound source region for which the detection duration is not measured by the duration measurement unit 105a. Is started (step S1020). After step S1020, the abnormal state detection unit 105b determines whether or not the non-detection continuation period for the sound source region where the measurement of the detection continuation time has started is being measured (step S1021). When the non-detection duration is being measured (step S1021; Yes), the abnormal state detection unit 105 causes the duration measurement unit 105a to finish measuring the non-detection duration (step S1022). Thereafter, the abnormal state detection unit 105b determines whether there is a sound source region in which the number of sound sources has increased (step S1023).

また、音源が検出され、かつ検出した全ての音源領域の検出継続時間を計測している場合（ステップＳ１０１９；Ｎｏ）、異常状態計測部１０５ｂは、ステップＳ１０２０〜Ｓ１０２２の処理を省略して、ステップＳ１０２３の判定を行う。また、不検出継続時間を計測中の音源領域がない場合（ステップＳ１０２１；Ｎｏ）、異常状態検出部１０５ｂは、ステップＳ１０２２の処理を省略して、ステップＳ１０２３の判定を行う。 When the sound source is detected and the detection durations of all detected sound source regions are measured (step S1019; No), the abnormal state measuring unit 105b omits the processes of steps S1020 to S1022, The determination in S1023 is performed. Further, when there is no sound source region for which the non-detection duration is being measured (step S1021; No), the abnormal state detection unit 105b omits the process of step S1022 and performs the determination of step S1023.

異常状態検出部１０５ｂは、音源数についての検出結果に基づいてステップＳ１０２３の判定を行う。音源数が増加した状態の音源領域がある場合（ステップＳ１０２３；Ｙｅｓ）、異常状態検出部１０５ｂは、次に、音源数が増加してからの経過時間が第２の時間閾値ＴＨ２以上であるか否かを判定する（ステップＳ１０２４）。経過時間が第２の時間閾値ＴＨ２以上である場合（ステップＳ１０２４；Ｙｅｓ）、異常状態検出部１０５ｂは、注意信号出力部１０５ｃに第２の注意信号を出力させる（ステップＳ１０２５）。注意信号出力部１０５ｃは、第２の注意信号として、例えば、音源数が増加した状態が所定期間続いたことを示すメッセージを含むメールを、予め登録されたメールアドレスに宛てて送信する。ステップＳ１０２５の処理が行われると、検出結果出力部は、図１５Ａに示したように、検出結果を出力する処理を終了する。 The abnormal state detection unit 105b performs the determination in step S1023 based on the detection result regarding the number of sound sources. If there is a sound source region in which the number of sound sources has increased (step S1023; Yes), the abnormal state detection unit 105b next determines whether the elapsed time since the number of sound sources has increased is equal to or greater than the second time threshold value TH2. It is determined whether or not (step S1024). When the elapsed time is equal to or greater than the second time threshold TH2 (step S1024; Yes), the abnormal state detection unit 105b causes the attention signal output unit 105c to output a second attention signal (step S1025). The attention signal output unit 105c transmits, for example, a mail including a message indicating that the state in which the number of sound sources has increased continues for a predetermined period as a second attention signal to a previously registered mail address. When the process of step S1025 is performed, the detection result output unit ends the process of outputting the detection result as illustrated in FIG. 15A.

また、音源数が増加した音源領域がない場合（ステップＳ１０２３；Ｎｏ）、異常状態検出部１０５ｂは、異常が検出されなかったと判定する。この場合、検出結果出力部１０５は、ステップＳ１０２４及びＳ１０２５の処理を省略して検出結果を出力する処理を終了する。また、音源数が増加した音源領域における音源数が増加してからの経過時間が第２の時間閾値ＴＨ２よりも短い場合（ステップＳ１０２４；Ｎｏ）も、異常状態検出部１０５ｂは、異常状態検出部１０５ｂは、異常が検出されなかったと判定する。よって、この場合も、検出結果出力部１０５は、ステップＳ１０２５の処理を省略して、検出結果を出力する処理を終了する。 If there is no sound source area where the number of sound sources has increased (step S1023; No), the abnormal state detection unit 105b determines that no abnormality has been detected. In this case, the detection result output unit 105 omits the processes of steps S1024 and S1025 and ends the process of outputting the detection result. Also, when the elapsed time after the increase in the number of sound sources in the sound source region where the number of sound sources has increased is shorter than the second time threshold value TH2 (step S1024; No), the abnormal state detection unit 105b is also in the abnormal state detection unit. 105b determines that no abnormality has been detected. Therefore, also in this case, the detection result output unit 105 omits the process of step S1025 and ends the process of outputting the detection result.

図１５Ａ及び図１５Ｂに示した検出結果を出力する処理の変形例において、検出結果出力部１０５は、音源の不検出継続時間が第１の時間閾値ＴＨ１以上になると第１の注意信号を出力する。そのため、上記の変形例においても、音源を検出する空間から遠隔の地にいる者に対し、音源が長時間検出されないという異常状態の発生を通知することが可能である。 In the modification of the process of outputting the detection results shown in FIGS. 15A and 15B, the detection result output unit 105 outputs the first attention signal when the non-detection duration of the sound source is equal to or greater than the first time threshold value TH1. . Therefore, also in the above modification, it is possible to notify the occurrence of an abnormal state in which a sound source is not detected for a long time to a person who is remote from the space where the sound source is detected.

また、上記の変形例において、検出結果出力部１０５は、音源領域の音源数が増加した状態であり、かつ音源数が増加してからの経過時間が第２の時間閾値ＴＨ２以上になると、第２の注意信号を出力する。 In the above modification, the detection result output unit 105 is in a state where the number of sound sources in the sound source region has increased, and when the elapsed time since the increase in the number of sound sources becomes equal to or greater than the second time threshold value TH2. 2 attention signal is output.

例えば、音源を検出する空間が住宅内等である場合、空間内にいる人物や各種装置等が音を発することにより、人物や装置等が音源として検出される。そのため、住宅で換気扇等の音源となり得る装置を日常的に動作させている場合、その装置が第１の音源として長時間検出され続けることがある。このような状況下で、第１のマイク２０１及び第２のマイク２０２から見て第１の音源と略同一方向に第２の音源が発生すると、音源検出装置１は、第１の音源が存在する音源領域の音源数が増加したこと（第２の音源が発生したこと）を検出する。第２の音源として検出される音源は、住宅で生活している人物であることのほか、第１の音源とは別の装置であることも考えられる。第２の音源として検出される音源が住宅で生活している人物、或いは当該人物が操作している装置である場合、これら人物や装置が長時間音を発し続けることは少ない。そのため、音源領域の音源数が増加した状態が長時間継続している場合、音源領域において何らかの異常が発生している可能性がある。従って、音源領域の音源数が増加した状態が長時間続いた場合に注意信号を出力することで、音源検出装置１の利用者は、未確定の音源が長時間存在し続ける異常が発生した可能性を早期に知ることができ、異常への対処を早期に行うことが可能となる。 For example, when a space for detecting a sound source is in a house or the like, a person or device in the space emits sound, so that the person or device is detected as a sound source. Therefore, when a device that can be a sound source such as a ventilation fan is operated on a daily basis in a house, the device may be detected as the first sound source for a long time. Under such circumstances, when the second sound source is generated in substantially the same direction as the first sound source when viewed from the first microphone 201 and the second microphone 202, the sound source detection device 1 has the first sound source. It is detected that the number of sound sources in the sound source region to be increased (the second sound source has been generated). The sound source detected as the second sound source may be a person living in a house or a device different from the first sound source. When the sound source detected as the second sound source is a person living in a house or a device operated by the person, it is unlikely that the person or device continues to emit sound for a long time. Therefore, if the state where the number of sound sources in the sound source area increases continues for a long time, there is a possibility that some abnormality has occurred in the sound source area. Therefore, by outputting a caution signal when the number of sound sources in the sound source region increases for a long time, the user of the sound source detection device 1 may have an abnormality in which an undefined sound source continues for a long time. Sex can be known at an early stage, and it becomes possible to deal with abnormalities at an early stage.

［第４の実施形態］
図１６は、第４の実施形態に係る音源検出装置の機能的構成を示す図である。 [Fourth Embodiment]
FIG. 16 is a diagram illustrating a functional configuration of a sound source detection device according to the fourth embodiment.

図１６に示すように、本実施形態に係る音源検出装置１は、音声信号受付部１０１と、変換部１０２と、位相差算出部１０３と、音源領域／音源数検出部１０４と、検出結果出力部１０５と、音源情報管理部１０６と、を備える。 As illustrated in FIG. 16, the sound source detection device 1 according to the present embodiment includes an audio signal reception unit 101, a conversion unit 102, a phase difference calculation unit 103, a sound source region / sound source number detection unit 104, and a detection result output. Unit 105 and a sound source information management unit 106.

音源領域／音源数検出部１０４は、位相差算出部１０３で算出した各周波数成分の位相差に基づいて、音源が存在する位相差領域及び音源数を決定する。図１６では省略しているが、音源領域／音源数検出部１０４は、周波数成分数算出部１０４ａと、位相差成分比率算出部１０４ｂと、成分比率遷移情報算出部１０４ｃと、成分比率保持部１０４ｄと、を含む。 The sound source region / sound source number detection unit 104 determines the phase difference region where the sound source exists and the number of sound sources based on the phase difference of each frequency component calculated by the phase difference calculation unit 103. Although omitted in FIG. 16, the sound source region / sound source number detection unit 104 includes a frequency component number calculation unit 104a, a phase difference component ratio calculation unit 104b, a component ratio transition information calculation unit 104c, and a component ratio holding unit 104d. And including.

検出結果出力部１０５は、音源領域／音源数検出部１０４の処理結果に基づいて、音源検出処理の結果を外部装置に出力する。検出結果出力部１０５は、継続時間計測部１０５ａと、異常状態検出部１０５ｂと、注意信号出力部１０５ｃと、検出音源情報保持部１０５ｄと、を含む。 The detection result output unit 105 outputs the result of the sound source detection process to the external device based on the processing result of the sound source region / sound source number detection unit 104. The detection result output unit 105 includes a duration measurement unit 105a, an abnormal state detection unit 105b, an attention signal output unit 105c, and a detected sound source information holding unit 105d.

音源情報管理部１０６は、音源を検出する空間における特定の音源についての情報（音源情報）を管理する。例えば、音源を検出する空間が住宅内である場合、特定の音源は、テレビ、換気扇、水道等である。音源情報管理部１０６は、音源情報登録部１０６ａと、音源情報保持部１０６ｂとを含む。 The sound source information management unit 106 manages information (sound source information) about a specific sound source in the space where the sound source is detected. For example, when a space for detecting a sound source is in a house, the specific sound source is a television, a ventilation fan, a water supply, or the like. The sound source information management unit 106 includes a sound source information registration unit 106a and a sound source information holding unit 106b.

音源情報登録部１０６ａは、特定の音源のみが音を発した状態における周波数成分毎の位相差（位相スペクトル差）に基づいて当該音源についての音源情報を生成する。生成する音源情報は、特定の音源が存在する位相差領域（音源領域）の情報を含む。音源情報登録部１０６ａは、生成した音源情報を音源情報保持部１０６ｂに保持（記憶）させる。音源情報保持部１０６ｂが保持する音源情報は、検出結果出力部１０５の異常状態検出部１０５ｂが異常状態を検出する際に参照する。 The sound source information registration unit 106a generates sound source information for the sound source based on the phase difference (phase spectrum difference) for each frequency component in a state where only a specific sound source emits sound. The sound source information to be generated includes information on a phase difference region (sound source region) where a specific sound source exists. The sound source information registration unit 106a holds (stores) the generated sound source information in the sound source information holding unit 106b. The sound source information held by the sound source information holding unit 106b is referred to when the abnormal state detection unit 105b of the detection result output unit 105 detects an abnormal state.

本実施形態の音源検出装置１は、例えば、図２Ａ及び図２Ｂに示したフローチャートに沿った音源検出処理とは別に、音源を検出する空間における特定の音源についての音源情報を生成して登録する処理を行う。音源情報を登録する処理を行う場合、音源を検出する空間を、特定の音源のみが音を発している状態にする。その後、音源検出装置１のオペレータは、図１６には示していない操作部を操作して、音源検出装置１を、音源情報を登録するモードで動作させる。このとき、音源検出装置１は、第１のマイク２０１及び第２のマイク２０２から入力される音声信号を周波数スペクトルに変換し、周波数成分毎の位相差を算出する。音声信号を周波数スペクトルに変換する処理は変換部１０２が行う。周波数成分毎の位相差を算出する処理は位相差算出部１０３が行う。また、音源情報を登録するモードで動作している場合、位相差算出部１０３は、算出した各周波数成分の位相差を、音源情報管理部１０６に送信する。音源情報管理部１０６は、位相差算出部１０３から受け取った各周波数成分の位相差に基づいて音源情報を生成し、生成した音源情報を音源情報保持部１０６ｂに保持させる。音源情報管理部１０６（音源情報登録部１０６ａ）は、例えば、各周波数成分の位相差に基づいて各位相差領域の周波数成分数を算出し、周波数成分数が最も多い位相差領域を登録対象の特定の音源についての音源領域に決定する。なお、上記の特定の音源についての音源領域を決定する処理は、音源情報管理部１０６（音源情報登録部１０６ａ）の代わりに、音源領域／音源数検出部１０４が行ってもよい。 The sound source detection device 1 of the present embodiment generates and registers sound source information about a specific sound source in a space for detecting a sound source, for example, separately from the sound source detection processing according to the flowcharts shown in FIGS. 2A and 2B. Process. When performing processing for registering sound source information, a space for detecting a sound source is set to a state in which only a specific sound source emits sound. Thereafter, the operator of the sound source detection device 1 operates an operation unit not shown in FIG. 16 to operate the sound source detection device 1 in a mode for registering sound source information. At this time, the sound source detection device 1 converts the audio signal input from the first microphone 201 and the second microphone 202 into a frequency spectrum, and calculates a phase difference for each frequency component. The conversion unit 102 performs processing for converting the audio signal into a frequency spectrum. The phase difference calculation unit 103 performs processing for calculating the phase difference for each frequency component. When operating in a mode for registering sound source information, the phase difference calculation unit 103 transmits the calculated phase difference of each frequency component to the sound source information management unit 106. The sound source information management unit 106 generates sound source information based on the phase difference of each frequency component received from the phase difference calculation unit 103, and causes the sound source information holding unit 106b to hold the generated sound source information. The sound source information management unit 106 (sound source information registration unit 106a) calculates, for example, the number of frequency components in each phase difference region based on the phase difference of each frequency component, and specifies the phase difference region having the largest number of frequency components as the registration target. Determine the sound source region for the sound source. Note that the sound source region / sound source number detection unit 104 may perform the process of determining the sound source region for the specific sound source instead of the sound source information management unit 106 (sound source information registration unit 106a).

上記の手順で音源情報を登録した後、音源検出装置１を用いた音源検出処理を開始する際には、音源検出装置１のオペレータが図１６には示していない操作部を操作し、音源検出装置１を、音源検出処理を行うモードで動作させる。音源検出処理を開始させると、音源検出装置１は、図２Ａ及び図２Ｂに示したフローチャートに沿った音源検出処理を行う。この際、音源検出装置１の検出結果出力部１０５は、検出結果を出力する処理（ステップＳ１０）として、図１７Ａ〜図１７Ｄに示す処理を行う。 When sound source detection processing using the sound source detection device 1 is started after the sound source information is registered according to the above procedure, the operator of the sound source detection device 1 operates an operation unit not shown in FIG. The apparatus 1 is operated in a mode for performing sound source detection processing. When the sound source detection process is started, the sound source detection apparatus 1 performs the sound source detection process according to the flowchart shown in FIGS. 2A and 2B. At this time, the detection result output unit 105 of the sound source detection device 1 performs the process shown in FIGS. 17A to 17D as the process of outputting the detection result (step S10).

図１７Ａは、第４の実施形態に係る検出結果を出力する処理の内容を説明するフローチャート（その１）である。図１７Ｂは、第４の実施形態に係る検出結果を出力する処理の内容を説明するフローチャート（その２）である。図１７Ｃは、第４の実施形態に係る検出結果を出力する処理の内容を説明するフローチャート（その３）である。図１７Ｄは、第４の実施形態に係る検出結果を出力する処理の内容を説明するフローチャート（その４）である。 FIG. 17A is a flowchart (part 1) illustrating the content of a process for outputting a detection result according to the fourth embodiment. FIG. 17B is a flowchart (part 2) illustrating the content of the process of outputting the detection result according to the fourth embodiment. FIG. 17C is a flowchart (part 3) illustrating the content of the process of outputting the detection result according to the fourth embodiment. FIG. 17D is a flowchart (part 4) illustrating the content of the process of outputting the detection result according to the fourth embodiment.

本実施形態に係る検出結果を出力する処理において、検出結果出力部１０５は、図１７Ａに示すように、まず、音源領域／音源数検出部１０４から取得した音源領域及び音源数の検出結果を保持する（ステップＳ１０３１）。検出結果出力部１０５は、音源領域及び音源数の検出結果とフレームの番号ｘとを対応付けて検出音源情報保持部１０５ｄに保持する。 In the process of outputting the detection result according to the present embodiment, the detection result output unit 105 first holds the detection result of the sound source region and the number of sound sources acquired from the sound source region / sound source number detection unit 104 as shown in FIG. 17A. (Step S1031). The detection result output unit 105 associates the detection result of the sound source region and the number of sound sources with the frame number x, and holds them in the detected sound source information holding unit 105d.

次に、検出結果出力部１０５は、検出した音源領域と、登録された音源情報に含まれる音源領域とを比較する（ステップＳ１０３２）。ステップＳ１０３２の処理は、異常状態検出部１０５ｂが行う。異常状態検出部１０５ｂは、音源情報管理部１０６の音源情報保持部１０６ｂが保持している音源情報を読み出し、音源情報に音源領域として登録されている位相差領域と、音源領域／音源数検出部１０４において音源領域として検出した位相差領域とを比較する。 Next, the detection result output unit 105 compares the detected sound source region with the sound source region included in the registered sound source information (step S1032). The abnormal state detection unit 105b performs the process in step S1032. The abnormal state detection unit 105b reads the sound source information held by the sound source information holding unit 106b of the sound source information management unit 106, and includes a phase difference region registered as a sound source region in the sound source information and a sound source region / sound source number detection unit. The phase difference area detected as a sound source area in 104 is compared.

次に、異常状態検出部１０５ｂは、音源領域に登録されていない位相差領域のなかに音源が検出された位相差領域があるか否かを判定する（ステップＳ１０３３）。音源領域に登録されていない位相差領域から音源が検出された場合（ステップＳ１０３３；Ｙｅｓ）、異常状態検出部１０５ｂは、次に、ステップＳ１０３４〜Ｓ１０３７の処理を行う。ステップＳ１０３４〜Ｓ１０３７は、音源領域に登録されていない位相差領域に未知の音源が所定時間存在するという第１の異常状態を検出するために行う処理である。一方、音源領域に登録されていない位相差領域から音源が検出されなかった場合（ステップＳ１０３３；Ｎｏ）、異常状態検出部１０５ｂは、次に、ステップＳ１０３８〜Ｓ１０４１の処理を行う。ステップＳ１０３８〜Ｓ１０４１は、音源領域に登録されていない位相差領域から音源が検出されない状態が所定時間以上継続しているという第２の異常状態を検出するために行う処理である。 Next, the abnormal state detection unit 105b determines whether or not there is a phase difference region in which a sound source is detected in a phase difference region that is not registered in the sound source region (step S1033). When a sound source is detected from a phase difference region that is not registered in the sound source region (step S1033; Yes), the abnormal state detection unit 105b next performs the processes of steps S1034 to S1037. Steps S1034 to S1037 are processing performed to detect a first abnormal state in which an unknown sound source exists for a predetermined time in a phase difference region that is not registered in the sound source region. On the other hand, when no sound source is detected from the phase difference region that is not registered in the sound source region (step S1033; No), the abnormal state detection unit 105b performs the processing of steps S1038 to S1041. Steps S1038 to S1041 are processes performed to detect a second abnormal state in which a state in which no sound source is detected from a phase difference region that is not registered in the sound source region continues for a predetermined time or longer.

第１の異常状態を検出する処理を行う場合、異常状態検出部１０５ｂは、音源領域に登録されていない位相差領域のうちの音源が検出された位相差領域を処理対象として、ステップＳ１０３４〜Ｓ１０３７の処理を行う。この場合、異常状態検出部１０５ｂは、まず、該当する位相差領域（処理対象の位相差領域）のなかに音源の検出継続時間を計測していない位相差領域があるか否かを判定する（ステップＳ１０３４）。 When performing the process of detecting the first abnormal state, the abnormal state detection unit 105b sets the phase difference area in which the sound source is detected among the phase difference areas not registered in the sound source area as processing targets, and performs steps S1034 to S1037. Perform the process. In this case, the abnormal state detection unit 105b first determines whether or not there is a phase difference region in which the detection duration of the sound source is not measured in the corresponding phase difference region (phase difference region to be processed) ( Step S1034).

検出継続時間を計測していない位相差領域がない場合（ステップＳ１０３４；Ｎｏ）、異常状態検出部１０５ｂは、次に、検出継続時間が第１の時間閾値ＴＨ１以上である位相差領域があるか否かを判定する（ステップＳ１０３６）。また、検出継続時間を計測していない位相差領域がある場合（ステップＳ１０３４；Ｙｅｓ）、異常状態検出部１０５ｂは、継続時間計測部１０５ａに、該当する位相差領域に対する検出継続時間の計測を開始させる（ステップＳ１０３５）。その後、異常状態検出部１０５ｂは、検出継続時間が第１の時間閾値ＴＨ１以上である位相差領域があるか否かを判定する（ステップＳ１０３６）。 If there is no phase difference region in which the detection duration is not measured (step S1034; No), the abnormal state detection unit 105b then has a phase difference region in which the detection duration is equal to or greater than the first time threshold value TH1. It is determined whether or not (step S1036). If there is a phase difference region where the detection duration is not measured (step S1034; Yes), the abnormal state detection unit 105b starts measuring the detection duration for the corresponding phase difference region in the duration measurement unit 105a. (Step S1035). Thereafter, the abnormal state detection unit 105b determines whether or not there is a phase difference region whose detection duration is equal to or greater than the first time threshold value TH1 (step S1036).

ステップＳ１０３６では、異常状態検出部１０５ｂは、音源領域に登録されていない位相差領域のうちの音源が検出された位相差領域を判定対象とする。検出継続時間が第１の時間閾値ＴＨ１以上である位相差領域がある場合（ステップＳ１０３６；Ｙｅｓ）、異常状態検出部１０５ｂは、注意信号出力部１０５ｃに第１の注意信号を出力させる（ステップＳ１０３７）。ステップＳ１０３７の処理において、注意信号出力部１０５ｃは、例えば、未知の音源が所定時間以上存在し続けていることを通知するメッセージを含む電子メールを、予め登録してあるメールアドレスに宛てて送信する処理を行う。その後、異常状態検出部１０５ｂは、図１７Ｃに示した、音源領域に登録された位相差領域のなかに音源が検出されなかった位相差領域があるか否かの判定を行う（ステップＳ１０４２）。また、検出継続時間が第１の時間閾値ＴＨ１以上である位相差領域はない場合（Ｓ１０３６；Ｎｏ）、異常状態検出部１０５ｂは、ステップＳ１０３７の処理を省略し、次に、ステップＳ１０４２の判定を行う。 In step S1036, the abnormal state detection unit 105b sets a phase difference region in which a sound source is detected among phase difference regions not registered in the sound source region as a determination target. If there is a phase difference region in which the detection duration is equal to or greater than the first time threshold TH1 (step S1036; Yes), the abnormal state detection unit 105b causes the attention signal output unit 105c to output the first attention signal (step S1037). ). In the process of step S1037, the attention signal output unit 105c transmits, for example, an e-mail including a message notifying that an unknown sound source has been present for a predetermined time or more to a pre-registered mail address. Process. Thereafter, the abnormal state detection unit 105b determines whether or not there is a phase difference region in which the sound source is not detected in the phase difference regions registered in the sound source region illustrated in FIG. 17C (step S1042). When there is no phase difference region where the detection duration is equal to or greater than the first time threshold value TH1 (S1036; No), the abnormal state detection unit 105b omits the process of step S1037, and then performs the determination of step S1042. Do.

一方、第２の異常状態を検出する処理を行う場合、異常状態検出部１０５ｂは、音源領域に登録されていない全ての位相差領域を処理対象として、図１７Ｂに示したステップＳ１０３８〜Ｓ１０４１の処理を行う。この場合、異常状態検出部１０５ｂは、まず、音源領域に登録されていない位相差領域のなかに音源の不検出継続時間を計測していない位相差領域があるか否かを判定する（ステップＳ１０３８）。 On the other hand, when performing the process of detecting the second abnormal state, the abnormal state detection unit 105b sets all the phase difference areas not registered in the sound source area as processing targets, and performs the processes of steps S1038 to S1041 illustrated in FIG. 17B. I do. In this case, the abnormal state detection unit 105b first determines whether there is a phase difference region in which the non-detection duration of the sound source is not measured in the phase difference regions that are not registered in the sound source region (step S1038). ).

処理対象の全ての位相差領域において不検出継続時間を計測している場合（ステップＳ１０３８；Ｎｏ）、異常状態検出部１０５ｂは、次に、不検出継続時間が第２の時間閾値ＴＨ２以上である位相差領域があるか否かを判定する（ステップＳ１０４０）。また、不検出継続時間を計測していない位相差領域がある場合（ステップＳ１０３８；Ｙｅｓ）、異常状態検出部１０５ｂは、継続時間計測部１０５ａに、該当する位相差領域に対する不検出継続時間の計測を開始させる（ステップＳ１０３９）。なお、ステップＳ１０３９において、異常状態検出部１０５ｂは、処理対象の位相差領域のなかに音源の検出継続時間を計測中の位相差領域がある場合、当該位相差領域に対する検出継続時間の計測を終了させる。その後、異常状態検出部１０５ｂは、不検出継続時間が第２の時間閾値ＴＨ２以上である位相差領域があるか否かを判定する（ステップＳ１０４０）。 When the non-detection continuation time is measured in all the phase difference regions to be processed (step S1038; No), the abnormal state detection unit 105b then has the non-detection continuation time equal to or greater than the second time threshold value TH2. It is determined whether or not there is a phase difference region (step S1040). Further, when there is a phase difference region where the non-detection duration is not measured (step S1038; Yes), the abnormal state detection unit 105b causes the duration measurement unit 105a to measure the non-detection duration for the corresponding phase difference region. Is started (step S1039). In step S1039, if there is a phase difference region for which the sound source detection duration is being measured in the phase difference region to be processed, the abnormal state detection unit 105b ends the measurement of the detection duration for the phase difference region. Let Thereafter, the abnormal state detection unit 105b determines whether or not there is a phase difference region whose non-detection duration is equal to or greater than the second time threshold value TH2 (step S1040).

ステップＳ１０４０では、異常状態検出部１０５ｂは、音源領域に登録されていない全ての位相差領域を判定対象とする。不検出継続時間が第２の時間閾値ＴＨ２以上である位相差領域がある場合（ステップＳ１０４０；Ｙｅｓ）、異常状態検出部１０５ｂは、注意信号出力部１０５ｃに第２の注意信号を出力させる（ステップＳ１０４１）。ステップＳ１０４１の処理では、注意信号出力部１０５ｃは、例えば、音源領域に登録していない位相差領域から所定時間以上音源が検出されなかったことを通知するメッセージを含む電子メールを、予め登録してあるメールアドレスに宛てて送信する処理を行う。その後、異常状態検出部１０５ｂは、図１７Ｃに示した、音源領域に登録された位相差領域のなかに音源が検出されなかった位相差領域があるか否かを判定する（ステップＳ１０４２）。また、不検出継続時間が第２の時間閾値ＴＨ２以上である位相差領域はない場合（Ｓ１０４０；Ｎｏ）、異常状態検出部１０５ｂは、ステップＳ１０４１の処理を省略し、次に、ステップＳ１０４２の判定を行う。 In step S1040, the abnormal state detection unit 105b sets all phase difference areas not registered in the sound source area as determination targets. If there is a phase difference region where the non-detection duration is equal to or greater than the second time threshold TH2 (step S1040; Yes), the abnormal state detection unit 105b causes the attention signal output unit 105c to output the second attention signal (step S1040). S1041). In the process of step S1041, the attention signal output unit 105c registers in advance an e-mail including a message notifying that a sound source has not been detected for a predetermined time or more from a phase difference region not registered in the sound source region, for example. Process to send to a certain mail address. Thereafter, the abnormal state detection unit 105b determines whether or not there is a phase difference region in which the sound source is not detected in the phase difference regions registered in the sound source region illustrated in FIG. 17C (step S1042). If there is no phase difference region where the non-detection duration is equal to or greater than the second time threshold value TH2 (S1040; No), the abnormal state detection unit 105b omits the process of step S1041, and then determines whether the step S1042 I do.

上記の第１の異常状態を検出する処理及び第２の異常状態を検出する処理は、それぞれ、音源領域に登録されていない位相差領域についての異常状態を検出する処理である。これに対し、ステップＳ１０４２以降の処理は、音源領域に登録された位相差領域についての異常状態を検出する処理である。 The process for detecting the first abnormal state and the process for detecting the second abnormal state are processes for detecting an abnormal state for a phase difference region that is not registered in the sound source region. On the other hand, the process after step S1042 is a process which detects the abnormal state about the phase difference area | region registered into the sound source area | region.

ステップＳ１０４２では、異常状態検出部１０５ｂは、音源領域に登録された位相差領域のなかに音源が検出されなかった位相差領域があるか否かを判定する。音源領域に登録された位相差領域のなかに音源が検出されなかった位相差領域がある場合（ステップＳ１０４２；Ｙｅｓ）、異常状態検出部０１５ｂは、次に、図１７Ｃに示したステップＳ１０４３〜Ｓ１０４６の処理を行う。ステップＳ１０４３〜Ｓ１０４６は、音源領域に登録された位相差領域から所定期間以上音源が検出されないという第３の異常状態を検出するために行う処理である。ステップＳ１０４３〜Ｓ１０４６の後、異常状態検出部０１５ｂは、図１７Ｄに示した、ステップＳ１０４７及びＳ１０４８の処理と、ステップＳ１０４９〜Ｓ１０５１の処理とを行う。ステップＳ１０４７及びＳ１０４８は、音源の検出継続時間を計測していない位相差領域に対する検出継続時間の計測を開始させる処理である。また、ステップＳ１０４９〜Ｓ１０５１は、音源領域に登録された位相差領域から未知の音源が所定期間以上存在し続けているという第４の異常状態を検出するために行う処理である。一方、音源領域に登録された全ての位相差領域から音源が検出された場合（ステップＳ１０４２；Ｎｏ）、異常状態検出部１０５ｂは、ステップＳ１０４３〜Ｓ１０４６の処理を省略し、ステップＳ１０４７〜Ｓ１０５１の処理を行う。 In step S1042, the abnormal state detection unit 105b determines whether there is a phase difference region in which no sound source is detected in the phase difference regions registered in the sound source region. When there is a phase difference region in which no sound source is detected in the phase difference region registered in the sound source region (step S1042; Yes), the abnormal state detection unit 015b then performs steps S1043 to S1046 illustrated in FIG. 17C. Perform the process. Steps S1043 to S1046 are processes performed to detect a third abnormal state in which no sound source is detected for a predetermined period or more from the phase difference region registered in the sound source region. After steps S1043 to S1046, the abnormal state detection unit 015b performs the processes of steps S1047 and S1048 and the processes of steps S1049 to S1051 shown in FIG. 17D. Steps S <b> 1047 and S <b> 1048 are processes for starting the measurement of the detection duration for the phase difference region where the detection duration of the sound source is not measured. Steps S1049 to S1051 are processes performed to detect a fourth abnormal state in which an unknown sound source continues to exist for a predetermined period or more from the phase difference region registered in the sound source region. On the other hand, when the sound source is detected from all the phase difference regions registered in the sound source region (step S1042; No), the abnormal state detection unit 105b omits the processing of steps S1043 to S1046 and performs the processing of steps S1047 to S1051. I do.

第３の異常状態を検出する処理を行う場合、異常状態検出部１０５ｂは、音源領域に登録された位相差領域のうちの音源が検出されなかった位相差領域を処理対象として、ステップＳ１０４３〜Ｓ１０４６の処理を行う。この場合、異常状態検出部１０５ｂは、まず、該当する位相差領域（処理対象の位相差領域）のなかに音源の不検出継続時間を計測していない位相差領域があるか否かを判定する（ステップＳ１０４３）。 When performing the process of detecting the third abnormal state, the abnormal state detection unit 105b sets the phase difference area in which the sound source is not detected among the phase difference areas registered in the sound source area as processing targets, and performs steps S1043 to S1046. Perform the process. In this case, the abnormal state detection unit 105b first determines whether or not there is a phase difference region in which the non-detection duration of the sound source is not measured in the corresponding phase difference region (processing target phase difference region). (Step S1043).

不検出継続時間を計測していない位相差領域がある場合（ステップＳ１０４３；Ｙｅｓ）、異常状態検出部１０５ｂは、継続時間計測部１０５ａに、該当する位相差領域に対する不検出継続時間の計測を開始させる（ステップＳ１０４４）。なお、ステップＳ１０４４において、異常状態検出部１０５ｂは、処理対象の位相差領域のなかに音源の検出継続時間を計測中の位相差領域がある場合、当該位相差領域に対する検出継続時間の計測を終了させる。その後、異常状態検出部１０５ｂは、不検出継続時間が第３の時間閾値ＴＨ３以上である位相差領域があるか否かを判定する（ステップＳ１０４５）。また、該当する全ての位相差領域の不検出継続時間を計測している場合（ステップＳ１０４３；Ｎｏ）、異常状態検出部１０５ｂは、ステップＳ１０４４の処理を省略し、ステップＳ１０４５の判定を行う。 When there is a phase difference region where the non-detection duration is not measured (step S1043; Yes), the abnormal state detection unit 105b starts measurement of the non-detection duration for the corresponding phase difference region in the duration measurement unit 105a. (Step S1044). In step S1044, when there is a phase difference region for which the detection duration of the sound source is being measured in the phase difference region to be processed, the abnormal state detection unit 105b ends the measurement of the detection duration for the phase difference region. Let Thereafter, the abnormal state detection unit 105b determines whether or not there is a phase difference region whose non-detection duration is equal to or greater than the third time threshold TH3 (step S1045). Further, when the non-detection continuation time of all corresponding phase difference regions is measured (step S1043; No), the abnormal state detection unit 105b omits the process of step S1044 and performs the determination of step S1045.

ステップＳ１０４５では、異常状態検出部１０５ｂは、音源領域に登録された位相差領域のうちの音源が検出されなかった位相差領域を判定対象とする。不検出継続時間が第３の時間閾値ＴＨ３以上である位相差領域がある場合（ステップＳ１０４５；Ｙｅｓ）、異常状態検出部１０５ｂは、注意信号出力部１０５ｃに第３の注意信号を出力させる（ステップＳ１０４６）。ステップＳ１０４６の処理では、注意信号出力部１０５ｃは、例えば、登録された音源が所定期間以上検出されないことを通知するメッセージを含む電子メールを、予め登録してあるメールアドレスに宛てて送信する処理を行う。その後、異常状態検出部１０５ｂは、図１７Ｄに示した、音源領域に登録された位相差領域のなかに音源の検出継続時間を計測していない位相差領域があるか否かの判定を行う（ステップＳ１０４７）。また、不検出継続時間が第３の時間閾値ＴＨ３以上である位相差領域はない場合（Ｓ１０４５；Ｎｏ）、異常状態検出部１０５ｂは、ステップＳ１０４６の処理を省略し、次に、ステップＳ１０４７の判定を行う。 In step S <b> 1045, the abnormal state detection unit 105 b sets a phase difference area in which no sound source is detected among the phase difference areas registered in the sound source area as a determination target. When there is a phase difference region whose non-detection duration is equal to or greater than the third time threshold TH3 (step S1045; Yes), the abnormal state detection unit 105b causes the attention signal output unit 105c to output a third attention signal (step S1045). S1046). In the process of step S1046, the attention signal output unit 105c performs, for example, a process of transmitting an e-mail including a message notifying that a registered sound source is not detected for a predetermined period or more to a pre-registered mail address. Do. Thereafter, the abnormal state detection unit 105b determines whether or not there is a phase difference region in which the detection duration of the sound source is not measured in the phase difference regions registered in the sound source region shown in FIG. 17D ( Step S1047). If there is no phase difference region where the non-detection duration is equal to or greater than the third time threshold TH3 (S1045; No), the abnormal state detection unit 105b omits the process of step S1046, and then determines in step S1047. I do.

一方、第３の異常状態を検出する処理を行わない場合（ステップＳ１０４２；Ｎｏ）、異常状態検出部１０５ｂは、図１７Ｃに示したステップＳ１０４３〜Ｓ１０４６の処理を省略し、次に、ステップＳ１０４７の判定を行う。 On the other hand, when the process of detecting the third abnormal state is not performed (step S1042; No), the abnormal state detection unit 105b omits the processes of steps S1043 to S1046 illustrated in FIG. 17C, and then performs the process of step S1047. Make a decision.

ステップＳ１０４７では、異常状態検出部１０５ｂは、音源領域に登録された位相差領域のなかに音源の検出継続時間を計測していない位相差領域があるか否かを判定する。検出継続時間を計測していない位相差領域がある場合（ステップＳ１０４７；Ｙｅｓ）、異常状態検出部１０５ｂは、継続時間計測部１０５ａに、該当する位相差領域に対する検出継続時間の計測を開始させる（ステップＳ１０４８）。なお、ステップＳ１０４８において、異常状態検出部１０５ｂは、処理対象の位相差領域のなかに音源の不検出継続時間を計測中の位相差領域がある場合、当該位相差領域に対する不検出継続時間の計測を終了させる。その後、異常状態検出部１０５ｂは、音源領域に登録された位相差領域のなかに音源数が増加した状態の位相差領域があるか否かを判定する（ステップＳ１０４９）。また、音源領域に登録された全ての位相差領域において検出継続時間を計測している場合（ステップＳ１０４７；Ｎｏ）、異常状態検出部１０５ｂは、ステップＳ１０４８の処理を省略し、次に、ステップＳ１０４９の判定を行う。 In step S1047, the abnormal state detection unit 105b determines whether there is a phase difference region in which the detection duration of the sound source is not measured in the phase difference regions registered in the sound source region. When there is a phase difference region in which the detection duration is not measured (step S1047; Yes), the abnormal state detection unit 105b causes the duration measurement unit 105a to start measuring the detection duration for the corresponding phase difference region ( Step S1048). In step S1048, when there is a phase difference region for which the non-detection duration of the sound source is being measured in the phase difference region to be processed, the abnormal state detection unit 105b measures the non-detection duration for the phase difference region. End. After that, the abnormal state detection unit 105b determines whether there is a phase difference region in which the number of sound sources is increased in the phase difference regions registered in the sound source region (step S1049). When the detection duration time is measured in all the phase difference regions registered in the sound source region (step S1047; No), the abnormal state detection unit 105b omits the process of step S1048, and then performs step S1049. Judgment is made.

ステップＳ１０４９では、異常状態検出部１０５ｂは、音源領域に登録された位相差領域のうちの音源が検出された位相差領域を判定対象とする。異常状態検出部１０５ｂは、音源領域及び音源数の検出結果に基づいて、音源数が増加した状態の位相差領域（音源領域）があるか否かを判定する。音源数が増加した状態の位相差領域がある場合（ステップＳ１０４９；Ｙｅｓ）、異常状態検出部１０５ｂは、次に、音源数が増加してからの経過時間が第４の時間閾値ＴＨ４以上であるか否かを判定する（ステップＳ１０５０）。異常状態検出部１０５ｂは、例えば、検出音源情報保持部１０５ｄで保持している音源領域及び音源数の検出結果の履歴に基づいて音源数が増加してからの経過時間を算出し、第４の時間閾値ＴＨ４と比較する。音源数が増加してからの経過時間が第４の時間閾値ＴＨ４以上である場合（ステップＳ１０５０；Ｙｅｓ）、異常状態検出部１０５ｂは、注意信号出力部１０５ｃに第４の注意信号を出力させる（ステップＳ１０５１）。ステップＳ１０５１の処理では、注意信号出力部１０５ｃは、例えば、音源領域に登録された位相差領域に未知の音源が所定時間以上存在し続けることを通知するメッセージを含む電子メールを、予め登録してあるメールアドレスに宛てて送信する処理を行う。ステップＳ１０５１の処理が行われると、検出結果出力部１０５は、検出結果を出力する処理を終了する（リターン）。 In step S1049, the abnormal state detection unit 105b determines a phase difference region in which a sound source is detected from among the phase difference regions registered in the sound source region as a determination target. The abnormal state detection unit 105b determines whether there is a phase difference region (sound source region) in a state where the number of sound sources has increased, based on the detection result of the sound source region and the number of sound sources. When there is a phase difference region in a state where the number of sound sources has increased (step S1049; Yes), the abnormal state detection unit 105b next has an elapsed time after the increase in the number of sound sources is equal to or greater than a fourth time threshold value TH4. Is determined (step S1050). The abnormal state detection unit 105b calculates, for example, an elapsed time after the number of sound sources increases based on the history of the detection results of the sound source region and the number of sound sources held by the detected sound source information holding unit 105d, Compare with the time threshold TH4. When the elapsed time from the increase in the number of sound sources is equal to or greater than the fourth time threshold TH4 (step S1050; Yes), the abnormal state detection unit 105b causes the attention signal output unit 105c to output a fourth attention signal ( Step S1051). In the process of step S1051, the attention signal output unit 105c, for example, registers in advance an e-mail including a message notifying that an unknown sound source continues to exist in the phase difference region registered in the sound source region for a predetermined time or more. Process to send to a certain mail address. When the process of step S1051 is performed, the detection result output unit 105 ends the process of outputting the detection result (return).

一方、音源数が増加した状態の位相差領域がない場合（ステップＳ１０４９；Ｎｏ）、異常状態検出部１０５ｂは、音源領域に登録された位相差領域に未知の音源が所定時間以上存在し続けるという第４の異常状態は検出されなかったと判定する。この場合、検出結果出力部１０５は、ステップＳ１０５０及びＳ１０５１の処理を省略し、検出結果を出力する処理を終了する。また、音源数が増加した状態の位相差領域における音源数が増加してからの経過時間が第４の時間閾値ＴＨ４よりも短い場合（ステップＳ１０５０；Ｎｏ）も、異常状態検出部１０５ｂは、第４の異常状態は検出されなかったと判定する。この場合、検出結果出力部１０５は、ステップＳ１０５１の処理を省略し、検出結果を出力する処理を終了する。 On the other hand, when there is no phase difference region in which the number of sound sources has increased (step S1049; No), the abnormal state detection unit 105b says that unknown sound sources continue to exist in the phase difference region registered in the sound source region for a predetermined time or more. It is determined that the fourth abnormal state has not been detected. In this case, the detection result output unit 105 omits the processes of steps S1050 and S1051, and ends the process of outputting the detection result. Even when the elapsed time after the number of sound sources in the phase difference region in the state where the number of sound sources has increased is shorter than the fourth time threshold value TH4 (step S1050; No), the abnormal state detection unit 105b It is determined that the abnormal state 4 is not detected. In this case, the detection result output unit 105 omits the process of step S1051 and ends the process of outputting the detection result.

このように、本実施形態に係る音源検出処理では、予め音源領域として登録してある位相差領域とは異なる位相差領域における音源の検出結果の時間変化に基づいて、音源を検出する空間における異常状態を検出する。そのため、住宅内を音源の検出対象とする場合、テレビ、換気扇、水道等の特定の位置に固定された音源が存在する位相差領域を音源領域として登録しておくことで、これらの音源とは異なる音源が発生したことを早期に検出することができる。また、本実施形態に係る音源検出処理では、音源領域として登録した位相差領域以外の位相差領域から長時間音源が検出されない場合にも注意信号を出力する。そのため、例えば、住宅内（音源を検出する空間）で生活している人物に何らかの異常が発生して音を発する行為を行えなくなった場合にも、異常を早期に検出することができる。 As described above, in the sound source detection processing according to the present embodiment, an abnormality in a space for detecting a sound source based on a time change of a sound source detection result in a phase difference region different from a phase difference region registered in advance as a sound source region. Detect state. Therefore, if you want to detect the sound source in the house, register the phase difference area where the sound source fixed at a specific position such as TV, ventilation fan, water supply, etc. is registered as the sound source area. The occurrence of a different sound source can be detected at an early stage. In the sound source detection processing according to the present embodiment, a caution signal is output even when a sound source is not detected for a long time from a phase difference region other than the phase difference region registered as the sound source region. Therefore, for example, even when an abnormality occurs in a person living in a house (a space where a sound source is detected) and an action to emit a sound cannot be performed, the abnormality can be detected early.

更に、本実施形態に係る音源検出処理では、音源領域に登録された位相差領域で音源数の増加した状態が長時間続いた場合にも注意信号を出力する。そのため、第１のマイク２０１及び第２のマイク２０２から見て登録してある音源と略同一の方向に、未登録の音源が長時間存在し続ける異常状態を早期に検出することができる。 Furthermore, in the sound source detection processing according to the present embodiment, a caution signal is output even when the state in which the number of sound sources has increased in the phase difference region registered in the sound source region continues for a long time. Therefore, an abnormal state in which an unregistered sound source continues to exist for a long time in substantially the same direction as the registered sound source when viewed from the first microphone 201 and the second microphone 202 can be detected at an early stage.

加えて、本実施形態に係る音源検出処理では、音源領域に登録された位相差領域から音源が検出されない状態が長時間続いた場合に、注意信号を出力する。音源を検出する空間が住宅内である場合、例えば、テレビ、換気扇、水道等の日常生活において使用頻度の高い装置を音源として登録する。そのため、音源に登録した使用頻度の高い装置が音源として検出されない状態が長時間続いた場合、登録した装置に何らかの異常が発生して動作しないこと、或いは住宅で生活する人物が登録した装置を使用できない状態になっていることが考えられる。よって、本実施形態のように音源領域に登録された位相差領域から音源が検出されない状態が長時間続いた場合に注意信号を出力することにより、利用者は、登録した音源に異常が発生した可能性を早期に知ることができる。 In addition, in the sound source detection processing according to the present embodiment, a caution signal is output when a state in which no sound source is detected from the phase difference region registered in the sound source region continues for a long time. When the space for detecting a sound source is in a house, for example, a device that is frequently used in daily life, such as a television, a ventilation fan, and a water supply, is registered as a sound source. Therefore, if a frequently used device registered in the sound source is not detected as a sound source for a long time, the registered device does not operate due to some abnormality, or a device registered by a person living in the house is used. It is possible that it is not possible. Therefore, when a state in which no sound source is detected from the phase difference region registered in the sound source region continues for a long time as in the present embodiment, the user generates an abnormality in the registered sound source. The possibility can be known early.

なお、第１〜第４の実施形態で示した音源検出装置１の機能的構成は一例に過ぎず、各実施形態で説明した音源検出処理を実行可能であれば他の構成であってもよい。例えば、音源検出装置１は、撮像装置により撮像された音源を検出する空間の画像を取得し、異常状態を検出した際に注意信号とともに取得した画像を外部装置に出力する構成にすることも可能である。 The functional configuration of the sound source detection device 1 shown in the first to fourth embodiments is merely an example, and other configurations may be used as long as the sound source detection processing described in each embodiment can be executed. . For example, the sound source detection device 1 can be configured to acquire an image of a space in which a sound source imaged by the imaging device is detected, and output the acquired image together with a caution signal to an external device when an abnormal state is detected. It is.

また、図２Ａ及び図２Ｂに示したフローチャートは、音源検出処理の一例に過ぎず、処理内容や処理手順は適宜変更可能である。同様に、図３Ａ及び図３Ｂに示したフローチャートは、音源数特定処理の一例に過ぎず、処理内容や処理手順は適宜変更可能である。更に、図１２に示したフローチャートは、位相スペクトル差を算出する処理の一例に過ぎず、処理内容や処理手順は適宜変更可能である。加えて、図１４、図１５Ａ及び図１５Ｂ、並びに図１７Ａ〜図１７Ｄに示したフローチャートは、いずれも検出結果を出力する処理の例に過ぎず、処理内容や処理手順は適宜変更可能である。 The flowcharts shown in FIGS. 2A and 2B are merely examples of sound source detection processing, and the processing content and processing procedure can be changed as appropriate. Similarly, the flowcharts shown in FIGS. 3A and 3B are merely examples of the sound source number specifying process, and the processing content and processing procedure can be changed as appropriate. Furthermore, the flowchart shown in FIG. 12 is merely an example of processing for calculating a phase spectrum difference, and the processing content and processing procedure can be changed as appropriate. In addition, the flowcharts shown in FIGS. 14, 15A and 15B, and FIGS. 17A to 17D are merely examples of processing for outputting detection results, and the processing content and processing procedure can be changed as appropriate.

また、第１〜第４の実施形態に係る音源検出装置１は、例えば、コンピュータと、当該コンピュータに実行させるプログラムとを用いて実現することが可能である。以下、コンピュータとプログラムとを用いて実現される音源検出装置について、図１８を参照して説明する。 The sound source detection device 1 according to the first to fourth embodiments can be realized using, for example, a computer and a program executed by the computer. Hereinafter, a sound source detection apparatus implemented using a computer and a program will be described with reference to FIG.

図１８は、コンピュータのハードウェア構成を示す図である。
図１８に示すように、コンピュータ７は、Central Processing Unit（ＣＰＵ）７０１と、主記憶装置７０２と、補助記憶装置７０３と、入力装置７０４と、出力装置７０５と、を備える。また、コンピュータ７は、インタフェース装置７０６と、通信制御装置７０７と、記憶媒体駆動装置７０８と、を備える。コンピュータ７におけるこれらの要素７０１〜７０８は、バス７１０により相互に接続されており、要素間でのデータの受け渡しが可能になっている。 FIG. 18 is a diagram illustrating a hardware configuration of a computer.
As illustrated in FIG. 18, the computer 7 includes a central processing unit (CPU) 701, a main storage device 702, an auxiliary storage device 703, an input device 704, and an output device 705. The computer 7 also includes an interface device 706, a communication control device 707, and a storage medium driving device 708. These elements 701 to 708 in the computer 7 are connected to each other by a bus 710 so that data can be exchanged between the elements.

ＣＰＵ７０１は、オペレーティングシステムを含む各種のプログラムを実行することによりコンピュータ７の全体の動作を制御する演算処理装置である。 The CPU 701 is an arithmetic processing unit that controls the overall operation of the computer 7 by executing various programs including an operating system.

主記憶装置７０２は、図示しないRead Only Memory（ＲＯＭ）及びRandom Access Memory（ＲＡＭ）を含む。主記憶装置７０２のＲＯＭには、例えばコンピュータ７の起動時にＣＰＵ７０１が読み出す所定の基本制御プログラム等が予め記録されている。また、主記憶装置７０２のＲＡＭは、ＣＰＵ７０１が各種のプログラムを実行する際に、必要に応じて作業用記憶領域として使用する。主記憶装置７０２のＲＡＭは、例えば、処理対象の音声信号についての周波数スペクトル、各周波数成分の位相差、位相差成分比率等の記憶に利用可能である。 The main storage device 702 includes a read only memory (ROM) and a random access memory (RAM) not shown. In the ROM of the main storage device 702, for example, a predetermined basic control program read by the CPU 701 when the computer 7 is started is recorded in advance. The RAM of the main storage device 702 is used as a working storage area as necessary when the CPU 701 executes various programs. The RAM of the main storage device 702 can be used for storing, for example, a frequency spectrum, a phase difference of each frequency component, a phase difference component ratio, and the like regarding the audio signal to be processed.

補助記憶装置７０３は、Hard Disk Drive（ＨＤＤ）やSolid State Drive（ＳＳＤ）等の主記憶装置７０２に比べて容量の大きい記憶装置である。補助記憶装置７０３には、ＣＰＵ７０１によって実行される各種のプログラムや各種のデータ等を記憶させることができる。補助記憶装置７０３は、例えば、図２Ａ及び図２Ｂに示した処理等を含む音源検出プログラムの記憶に利用可能である。また、補助記憶装置７０３は、例えば、処理対象の音声信号についての周波数スペクトル、周波数成分毎の位相差、位相差成分比率、特定の音源についての音源情報等の記憶に利用可能である。 The auxiliary storage device 703 is a storage device having a larger capacity than the main storage device 702 such as a hard disk drive (HDD) or a solid state drive (SSD). The auxiliary storage device 703 can store various programs executed by the CPU 701, various data, and the like. The auxiliary storage device 703 can be used, for example, for storing a sound source detection program including the processes shown in FIGS. 2A and 2B. Further, the auxiliary storage device 703 can be used for storing, for example, a frequency spectrum for a processing target audio signal, a phase difference for each frequency component, a phase difference component ratio, sound source information for a specific sound source, and the like.

入力装置７０４は、例えばキーボード装置やタッチパネル装置である。コンピュータ７のオペレータ（利用者）が入力装置７０４に対し押下する等の操作を行うと、入力装置７０４は、その操作内容に対応付けられている入力情報をＣＰＵ７０１に送信する。入力装置７０４は、例えば、コンピュータ７（音源検出装置１）の起動や動作の終了、動作モードの切り替え等に用いる。 The input device 704 is, for example, a keyboard device or a touch panel device. When an operator (user) of the computer 7 performs an operation such as pressing the input device 704, the input device 704 transmits input information associated with the operation content to the CPU 701. The input device 704 is used, for example, for starting the computer 7 (sound source detection device 1), ending the operation, switching the operation mode, and the like.

出力装置７０５は、例えば、液晶ディスプレイやプリンタ等である。出力装置７０５は、例えば、コンピュータ７（音源検出装置１）の動作状態の表示等に用いる。 The output device 705 is, for example, a liquid crystal display or a printer. The output device 705 is used, for example, for displaying the operation state of the computer 7 (sound source detection device 1).

インタフェース装置７０６は、コンピュータ７と他の電子装置等とを接続する装置であり、Universal Serial Bus（ＵＳＢ）規格のコネクタ等を備える。インタフェース装置７０６によりコンピュータ７と接続可能な装置には、複数のマイクロフォンを含むマイクロフォンアレイ２がある。 The interface device 706 is a device that connects the computer 7 to other electronic devices, and includes a Universal Serial Bus (USB) standard connector and the like. A device that can be connected to the computer 7 by the interface device 706 includes a microphone array 2 including a plurality of microphones.

通信制御装置７０７は、電話網やインターネット等のネットワーク８を介したコンピュータ７と他の通信端末９との各種通信を制御する装置である。通信制御装置７０７は、例えば、異常状態を検出した際の注意情報の出力に用いる。 The communication control device 707 is a device that controls various communications between the computer 7 and other communication terminals 9 via the network 8 such as a telephone network or the Internet. The communication control device 707 is used, for example, for outputting caution information when an abnormal state is detected.

記憶媒体駆動装置７０８は、図示しない可搬型記憶媒体に記録されているプログラムやデータの読み出し、補助記憶装置７０３に記憶されたデータ等の可搬型記憶媒体への書き込みを行う。可搬型記憶媒体としては、例えば、ＵＳＢ規格のコネクタが備えられているフラッシュメモリが利用可能である。また、可搬型記憶媒体としては、Compact Disk（ＣＤ）、Digital Versatile Disc（ＤＶＤ）、Blu-ray Disc（Blu-rayは登録商標）等の光ディスクも利用可能である。 The storage medium driving device 708 reads programs and data recorded in a portable storage medium (not shown), and writes data stored in the auxiliary storage device 703 to the portable storage medium. As the portable storage medium, for example, a flash memory equipped with a USB standard connector can be used. Further, as a portable storage medium, an optical disc such as a Compact Disk (CD), a Digital Versatile Disc (DVD), and a Blu-ray Disc (Blu-ray is a registered trademark) can be used.

コンピュータ７は、ＣＰＵ７０１が補助記憶装置７０３等から図２Ａ及び図２Ｂの処理を含むプログラムを読み出して実行し、マイクロフォンアレイ２から入力された音声信号に基づいて音源領域及び音源数を検出する処理を行う。また、コンピュータ７は、音源領域及び音源数の検出結果に基づいて異常状態を検出する処理を行い、異常状態が発生した場合には、通信制御装置７０７を用いて所定の通信端末９に対し注意信号を出力する。 In the computer 7, the CPU 701 reads out and executes the program including the processes of FIGS. 2A and 2B from the auxiliary storage device 703 and the like, and detects the sound source region and the number of sound sources based on the sound signal input from the microphone array 2. I do. In addition, the computer 7 performs processing for detecting an abnormal state based on the detection result of the sound source region and the number of sound sources. When an abnormal state occurs, the computer 7 uses the communication control device 707 to pay attention to the predetermined communication terminal 9. Output a signal.

なお、音源検出装置１として用いるコンピュータ７は、図１８に示した全ての構成要素を含む必要はなく、用途や条件に応じて一部の構成要素を省略することも可能である。例えば、コンピュータ７は、記憶媒体駆動装置７０８が省略されたものであってもよい。また、例えば、コンピュータ７は、マイクロフォンアレイ２がプリント回路板に直接接続されており、インタフェース装置７０６が省略されたものであってもよい。 Note that the computer 7 used as the sound source detection device 1 does not need to include all the components shown in FIG. 18, and some of the components can be omitted depending on applications and conditions. For example, the computer 7 may be one in which the storage medium driving device 708 is omitted. For example, the computer 7 may be one in which the microphone array 2 is directly connected to the printed circuit board and the interface device 706 is omitted.

以上記載した各実施形態に関し、更に以下の付記を開示する。
（付記１）
マイクアレイ装置の複数のマイクから取得した複数の音声信号の周波数スペクトルに基づいて、前記音声信号における周波数成分毎の位相差を算出する位相差算出部と、
算出した前記周波数成分毎の位相差に基づいて、音源が存在する方向及び音源数を検出する音源検出部と、を備え、
前記音源検出部は、
前記マイクに到来する音の到来方向に基づいて分割された複数の位相差領域と、算出した前記周波数成分毎の位相差とに基づいて、前記位相差領域毎の周波数成分数を算出する周波数成分数算出部と、
算出した前記位相差領域毎の周波数成分数に基づいて前記音声信号における前記複数の位相差領域の周波数成分数の比率を算出する位相差成分比率算出部と、
前記周波数成分数の比率の時間変化に基づいて、前記周波数成分数の比率の遷移情報を算出する遷移情報算出部と、を含み、
当該音源検出部は、前記位相差領域毎の周波数成分数に基づいて前記音源が存在する位相差領域を決定するとともに、前記周波数成分数の比率の遷移情報に基づいて前記音源が存在する位相差領域内の音源数を決定する、
ことを特徴とする音源検出装置。
（付記２）
前記音源検出部は、前記音源が存在する位相差領域の前記周波数成分数の比率の変化量が所定の変化量を超えた場合に、当該位相差領域内の音源数が変化したと判定する、
ことを特徴とする付記１に記載の音源検出装置。
（付記３）
前記音源検出部は、
前記音源が存在する位相差領域の前記周波数成分数の比率が所定の閾値以上になった場合に、当該位相差領域内の音源数が増加したと判定する、
ことを特徴とする付記１に記載の音源検出装置。
（付記４）
前記音源検出部は、
前記複数の位相差領域から前記周波数成分数が多い順に所定数の位相差領域を抽出し（Ｓ６）、抽出した前記所定数の位相差領域における周波数成分数に基づいて前記音源が存在する位相差領域を決定する、
ことを特徴とする付記１に記載の音源検出装置。
（付記５）
前記マイクから取得した前記音声信号に含まれる定常雑音の振幅レベルを推定する定常雑音推定部、を更に備え、
前記位相差算出部は、前記音声信号の周波数スペクトルにおける前記周波数成分のうち振幅スペクトルが前記定常雑音の振幅レベルよりも大きい周波数成分の位相差を算出する、
ことを特徴とする付記１に記載の音源検出装置。
（付記６）
前記音源の存在する位相差領域及び前記音源数の検出結果に基づいて、前記音源の検出対象空間における異常状態を検出する異常状態検出部を、更に備える、
ことを特徴とする付記１に記載の音源検出装置。
（付記７）
前記異常状態検出部は、前記音源が検出されない状態の継続時間に基づいて前記異常状態を検出する、
ことを特徴とする付記６に記載の音源検出装置。
（付記８）
前記異常状態検出部は、
前記音源が存在する位相差領域における前記音源数が増加した状態の継続時間に基づいて前記異常状態を検出する、
ことを特徴とする付記６に記載の音源検出装置。
（付記９）
前記音源が存在する位相差領域を示す情報を含む音源情報を保持する音源情報保持部、を更に備え、
前記異常状態検出部は、前記音源の存在する位相差領域及び前記音源数の検出結果と、前記音源情報とに基づいて、前記音源の検出対象空間における異常状態を検出する、
ことを特徴とする付記６に記載の音源検出装置。
（付記１０）
前記異常状態検出部は、
前記音源情報における前記音源が存在する位相差領域を除く他の位相差領域から音源が検出されない状態が所定時間以上継続した場合に、前記異常状態であると判定する、
ことを特徴とする付記９に記載の音源検出装置。
（付記１１）
前記異常状態検出部は、
前記音源情報における前記音源が存在する位相差領域を除く他の位相差領域のなかに、音源が検出された状態が所定時間以上継続している位相差領域がある場合に、前記異常状態であると判定する、
ことを特徴とする付記９に記載の音源検出装置。
（付記１２）
前記異常状態検出部は、
前記音源情報における前記音源が存在する位相差領域のなかに前記音源検出部で音源が検出されなかった位相差領域があり、かつ当該音源が検出されなかった位相差領域における前記音源が検出されない状態が所定の時間継続した場合に、前記異常状態であると判定する、
ことを特徴とする付記９に記載の音源検出装置。
（付記１３）
前記異常状態検出部は、
前記音源情報における前記音源が存在する位相差領域のなかに、前記音源数が増加した状態が所定時間以上継続している位相差領域がある場合に、前記異常状態であると判定する、
ことを特徴とする付記９に記載の音源検出装置。
（付記１４）
コンピュータが、
マイクアレイ装置の複数のマイクから取得した複数の音声信号を周波数スペクトルに変換して周波数成分毎の位相差を算出し、
前記マイクに到来する音の到来方向に基づいて分割された複数の位相差領域と、算出した前記周波数成分毎の位相差とに基づいて、前記位相差領域毎の周波数成分数を算出し、
算出した前記位相差領域毎の周波数成分数に基づいて、音源が存在する位相差領域を決定し、
前記位相差領域毎の周波数成分に基づいて、前記音声信号における前記複数の位相差領域の周波数成分数の比率を算出し、
前記周波数成分数の比率の時間変化に基づいて、前記音源が存在する位相差領域内の音源数を決定する、
処理を実行することを特徴とする音源検出方法。
（付記１５）
前記音源数を決定する処理では、前記コンピュータは、前記音源が存在する位相差領域の前記周波数成分数の比率の変化量が所定の変化量を超えた場合に、当該位相差領域内の音源数が変化したと判定する、
ことを特徴とする付記１４に記載の音源検出方法。
（付記１６）
前記音源数を決定する処理では、前記コンピュータは、前記音源が存在する位相差領域の前記周波数成分数の比率が所定の閾値以上になった場合に、当該位相差領域内の音源数が増加したと判定する、
ことを特徴とする付記１４に記載の音源検出方法。
（付記１７）
前記音源が存在する位相差領域を決定する処理では、前記コンピュータは、前記複数の位相差領域から前記周波数成分数が多い順に所定数の位相差領域を抽出し、抽出した前記所定数の位相差領域における周波数成分数に基づいて前記音源が存在する位相差領域を決定する、
ことを特徴とする付記１４に記載の音源検出方法。
（付記１８）
前記位相差を算出する処理では、前記コンピュータは、前記マイクから取得した前記音声信号に含まれる定常雑音の振幅レベルを前記周波数成分毎に推定し、前記音声信号の周波数スペクトルにおける前記周波数成分のうち振幅スペクトルが前記定常雑音の振幅レベルよりも大きい周波数成分の位相差を算出する、
ことを特徴とする付記１４に記載の音源検出方法。
（付記１９）
前記コンピュータが、更に、前記音源の存在する位相差領域及び前記音源数の検出結果に基づいて、前記音源の検出対象空間における異常状態を検出する処理、を実行する
ことを特徴とする付記１４に記載の音源検出方法。
（付記２０）
前記異常状態を検出する処理では、前記コンピュータは、前記音源が検出されない状態の継続時間に基づいて前記異常状態を検出する、
ことを特徴とする付記１９に記載の音源検出方法。
（付記２１）
前記異常状態を検出する処理では、前記コンピュータは、前記音源が存在する位相差領域における前記音源数が増加した状態の継続時間に基づいて前記異常状態を検出する、
ことを特徴とする付記１９に記載の音源検出方法。
（付記２２）
前記異常状態を検出する処理では、前記コンピュータは、前記音源の存在する位相差領域及び前記音源数の検出結果と、記憶部に記憶させた前記音源が存在する位相差領域を示す情報を含む音源情報とに基づいて、前記音源の検出対象空間における異常状態を検出する、
ことを特徴とする付記１９に記載の音源検出方法。
（付記２３）
前記異常状態を検出する処理では、前記コンピュータは、前記音源情報における前記音源が存在する位相差領域を除く他の位相差領域から音源が検出されない状態が所定時間以上継続した場合に、前記異常状態であると判定する、
ことを特徴とする付記２２に記載の音源検出方法。
（付記２４）
前記異常状態を検出する処理では、前記コンピュータは、前記音源情報における前記音源が存在する位相差領域を除く他の位相差領域のなかに、音源が検出された状態が所定時間以上継続している位相差領域がある場合に、前記異常状態であると判定する、
ことを特徴とする付記２２に記載の音源検出方法。
（付記２５）
前記異常状態を検出する処理では、前記コンピュータは、前記音源情報における前記音源が存在する位相差領域のなかに前記音源検出部で音源が検出されなかった位相差領域があり、かつ当該音源が検出されなかった位相差領域における前記音源が検出されない状態が所定の時間継続した場合に、前記異常状態であると判定する、
ことを特徴とする付記２２に記載の音源検出方法。
（付記２６）
前記異常状態を検出する処理では、前記コンピュータは、前記前記音源情報における前記音源が存在する位相差領域のなかに、前記音源数が増加した状態が所定時間以上継続している位相差領域がある場合に、前記異常状態であると判定する、
ことを特徴とする付記２２に記載の音源検出方法。
（付記２７）
マイクアレイ装置の複数のマイクから取得した複数の音声信号を周波数スペクトルに変換して周波数成分毎の位相差を算出し、
前記マイクに到来する音の到来方向に基づいて分割された複数の位相差領域と、算出した前記周波数成分毎の位相差とに基づいて、前記位相差領域毎の周波数成分数を算出し、
算出した前記位相差領域毎の周波数成分数に基づいて、音源が存在する位相差領域を決定し、
前記位相差領域毎の周波数成分数に基づいて、前記音声信号における前記複数の位相差領域の周波数成分数の比率を算出し、
前記周波数成分の比率の時間変化に基づいて、前記音源が存在する位相差領域内の音源数を決定する、
処理をコンピュータに実行させるプログラム。 The following additional notes are disclosed for each of the embodiments described above.
(Appendix 1)
Based on the frequency spectrum of a plurality of audio signals acquired from a plurality of microphones of the microphone array device, a phase difference calculation unit that calculates a phase difference for each frequency component in the audio signal;
Based on the calculated phase difference for each frequency component, and a sound source detector that detects the direction and number of sound sources in which the sound source exists,
The sound source detector
A frequency component for calculating the number of frequency components for each phase difference region based on a plurality of phase difference regions divided based on the direction of arrival of sound arriving at the microphone and the calculated phase difference for each frequency component A number calculator,
A phase difference component ratio calculation unit that calculates a ratio of the number of frequency components of the plurality of phase difference regions in the audio signal based on the calculated number of frequency components for each phase difference region;
A transition information calculation unit that calculates transition information of the frequency component number ratio based on a temporal change in the frequency component number ratio;
The sound source detection unit determines a phase difference region where the sound source exists based on the number of frequency components for each phase difference region, and also includes a phase difference where the sound source exists based on transition information of the ratio of the number of frequency components. Determine the number of sound sources in the region,
A sound source detection device characterized by that.
(Appendix 2)
The sound source detection unit determines that the number of sound sources in the phase difference region has changed when the amount of change in the ratio of the number of frequency components in the phase difference region where the sound source exists exceeds a predetermined amount of change.
The sound source detection apparatus according to Supplementary Note 1, wherein
(Appendix 3)
The sound source detector
When the ratio of the number of frequency components in the phase difference region where the sound source exists is equal to or greater than a predetermined threshold, it is determined that the number of sound sources in the phase difference region has increased.
The sound source detection apparatus according to Supplementary Note 1, wherein
(Appendix 4)
The sound source detector
A predetermined number of phase difference regions are extracted from the plurality of phase difference regions in descending order of the number of frequency components (S6), and the phase difference in which the sound source exists based on the extracted number of frequency components in the predetermined number of phase difference regions Determine the area,
The sound source detection apparatus according to Supplementary Note 1, wherein
(Appendix 5)
A stationary noise estimation unit for estimating an amplitude level of stationary noise included in the audio signal acquired from the microphone;
The phase difference calculating unit calculates a phase difference of a frequency component having an amplitude spectrum larger than an amplitude level of the stationary noise among the frequency components in the frequency spectrum of the audio signal;
The sound source detection apparatus according to Supplementary Note 1, wherein
(Appendix 6)
Based on the phase difference region where the sound source exists and the detection result of the number of sound sources, further comprising an abnormal state detection unit for detecting an abnormal state in the detection target space of the sound source,
The sound source detection apparatus according to Supplementary Note 1, wherein
(Appendix 7)
The abnormal state detection unit detects the abnormal state based on a duration of a state in which the sound source is not detected;
The sound source detection apparatus according to appendix 6, wherein
(Appendix 8)
The abnormal state detection unit,
Detecting the abnormal state based on a duration of a state in which the number of sound sources is increased in a phase difference region where the sound source exists;
The sound source detection apparatus according to appendix 6, wherein
(Appendix 9)
A sound source information holding unit for holding sound source information including information indicating a phase difference region where the sound source exists,
The abnormal state detection unit detects an abnormal state in the detection target space of the sound source based on the detection result of the phase difference region where the sound source exists and the number of sound sources, and the sound source information.
The sound source detection apparatus according to appendix 6, wherein
(Appendix 10)
The abnormal state detection unit,
When a state in which a sound source is not detected from other phase difference areas excluding a phase difference area where the sound source exists in the sound source information continues for a predetermined time or more, it is determined as the abnormal state.
The sound source detection device according to appendix 9, wherein
(Appendix 11)
The abnormal state detection unit,
In the phase difference region other than the phase difference region where the sound source exists in the sound source information, there is a phase difference region in which the state in which the sound source is detected continues for a predetermined time or more. To determine,
The sound source detection device according to appendix 9, wherein
(Appendix 12)
The abnormal state detection unit,
A state in which the sound source is not detected by the sound source detection unit in the phase difference region where the sound source exists in the sound source information, and the sound source is not detected in the phase difference region where the sound source is not detected Is determined to be in the abnormal state when is continued for a predetermined time,
The sound source detection device according to appendix 9, wherein
(Appendix 13)
The abnormal state detection unit,
In the phase difference region where the sound source exists in the sound source information, when there is a phase difference region in which the state where the number of sound sources is increased continues for a predetermined time or more, it is determined as the abnormal state.
The sound source detection device according to appendix 9, wherein
(Appendix 14)
Computer
A plurality of audio signals acquired from a plurality of microphones of the microphone array device are converted into a frequency spectrum to calculate a phase difference for each frequency component,
Based on the plurality of phase difference regions divided based on the direction of arrival of the sound arriving at the microphone and the calculated phase difference for each frequency component, the number of frequency components for each phase difference region is calculated,
Based on the calculated number of frequency components for each phase difference region, determine the phase difference region where the sound source exists,
Based on the frequency component for each phase difference region, the ratio of the number of frequency components of the plurality of phase difference regions in the audio signal is calculated,
Determining the number of sound sources in the phase difference region where the sound source exists, based on the time change of the ratio of the number of frequency components;
A sound source detection method characterized by executing processing.
(Appendix 15)
In the process of determining the number of sound sources, when the amount of change in the ratio of the number of frequency components in the phase difference region where the sound source exists exceeds a predetermined amount of change, the computer counts the number of sound sources in the phase difference region. Is determined to have changed,
15. The sound source detection method according to appendix 14, wherein
(Appendix 16)
In the process of determining the number of sound sources, the computer increases the number of sound sources in the phase difference region when the ratio of the number of frequency components in the phase difference region in which the sound source exists is equal to or greater than a predetermined threshold. To determine,
15. The sound source detection method according to appendix 14, wherein
(Appendix 17)
In the process of determining the phase difference area where the sound source exists, the computer extracts a predetermined number of phase difference areas from the plurality of phase difference areas in descending order of the number of frequency components, and extracts the predetermined number of phase differences. Determining a phase difference region where the sound source exists based on the number of frequency components in the region;
15. The sound source detection method according to appendix 14, wherein
(Appendix 18)
In the process of calculating the phase difference, the computer estimates an amplitude level of stationary noise included in the audio signal acquired from the microphone for each frequency component, and among the frequency components in the frequency spectrum of the audio signal Calculating a phase difference of frequency components whose amplitude spectrum is larger than the amplitude level of the stationary noise;
15. The sound source detection method according to appendix 14, wherein
(Appendix 19)
The computer further executes a process of detecting an abnormal state in the detection target space of the sound source based on a detection result of the phase difference region where the sound source exists and the number of sound sources. The described sound source detection method.
(Appendix 20)
In the process of detecting the abnormal state, the computer detects the abnormal state based on a duration of a state in which the sound source is not detected.
The sound source detection method according to appendix 19, characterized by:
(Appendix 21)
In the process of detecting the abnormal state, the computer detects the abnormal state based on a duration of a state in which the number of sound sources has increased in a phase difference region where the sound source exists.
The sound source detection method according to appendix 19, characterized by:
(Appendix 22)
In the process of detecting the abnormal state, the computer includes a sound source including information indicating a phase difference region where the sound source exists and a detection result of the number of sound sources, and a phase difference region where the sound source is stored in a storage unit. Detecting an abnormal state in the detection target space of the sound source based on the information;
The sound source detection method according to appendix 19, characterized by:
(Appendix 23)
In the process of detecting the abnormal state, the computer detects the abnormal state when a state in which no sound source is detected from a phase difference region other than the phase difference region where the sound source exists in the sound source information continues for a predetermined time or more. It is determined that
The sound source detection method according to Supplementary Note 22, wherein
(Appendix 24)
In the process of detecting the abnormal state, the computer continues the state in which the sound source is detected in a phase difference region other than the phase difference region where the sound source exists in the sound source information for a predetermined time or more. When there is a phase difference region, it is determined that the state is abnormal.
The sound source detection method according to Supplementary Note 22, wherein
(Appendix 25)
In the process of detecting the abnormal state, the computer includes a phase difference region in which the sound source is not detected by the sound source detection unit in the phase difference region where the sound source exists in the sound source information, and the sound source is detected. When the state where the sound source is not detected in the phase difference region that has not been continued for a predetermined time, it is determined that the abnormal state,
The sound source detection method according to Supplementary Note 22, wherein
(Appendix 26)
In the process of detecting the abnormal state, the computer includes a phase difference region in which the number of sound sources has increased for a predetermined time or more in the phase difference region in which the sound source exists in the sound source information. If it is determined that the abnormal state,
The sound source detection method according to Supplementary Note 22, wherein
(Appendix 27)
A plurality of audio signals acquired from a plurality of microphones of the microphone array device are converted into a frequency spectrum to calculate a phase difference for each frequency component,
Based on the plurality of phase difference regions divided based on the direction of arrival of the sound arriving at the microphone and the calculated phase difference for each frequency component, the number of frequency components for each phase difference region is calculated,
Based on the calculated number of frequency components for each phase difference region, determine the phase difference region where the sound source exists,
Based on the number of frequency components for each phase difference region, calculate the ratio of the number of frequency components in the plurality of phase difference regions in the audio signal,
Determining the number of sound sources in the phase difference region where the sound source exists, based on the time change of the ratio of the frequency components;
A program that causes a computer to execute processing.

１音源検出装置
１０１音声信号受付部
１０２変換部
１０３位相差算出部
１０４音源領域／音源数検出部
１０４ａ周波数成分数算出部
１０４ｂ位相差成分比率算出部
１０４ｃ成分比率遷移情報算出部
１０４ｄ成分比率保持部
１０５検出結果出力部
１０５ａ継続時間計測部
１０５ｂ異常状態検出部
１０５ｃ注意信号出力部
１０５ｄ検出音源情報保持部
１０６音源情報管理部
１０６ａ音源情報登録部
１０６ｂ音源情報保持部
２マイクロフォンアレイ
２０１，２０２マイクロフォン
４０１，４０２音源
７コンピュータ
７０１ＣＰＵ
７０２主記憶装置
７０３補助記憶装置
７０４入力装置
７０５出力装置
７０６インタフェース装置
７０７通信装置
７０８記憶媒体駆動装置
７１０バス
８ネットワーク
９通信端末 DESCRIPTION OF SYMBOLS 1 Sound source detection apparatus 101 Audio | voice signal reception part 102 Conversion part 103 Phase difference calculation part 104 Sound source area / sound source number detection part 104a Frequency component number calculation part 104b Phase difference component ratio calculation part 104c Component ratio transition information calculation part 104d Component ratio holding part 105 detection result output unit 105a duration measurement unit 105b abnormal state detection unit 105c attention signal output unit 105d detection sound source information holding unit 106 sound source information management unit 106a sound source information registration unit 106b sound source information holding unit 2 microphone array 201, 202 microphone 401, 402 Sound source 7 Computer 701 CPU
702 Main storage device 703 Auxiliary storage device 704 Input device 705 Output device 706 Interface device 707 Communication device 708 Storage medium drive device 710 Bus 8 Network 9 Communication terminal

Claims

Based on the frequency spectrum of a plurality of audio signals acquired from a plurality of microphones of the microphone array device, a phase difference calculation unit that calculates a phase difference for each frequency component in the audio signal;
Based on the calculated phase difference for each frequency component, and a sound source detector that detects the direction and number of sound sources in which the sound source exists,
The sound source detector
A frequency component for calculating the number of frequency components for each phase difference region based on a plurality of phase difference regions divided based on the direction of arrival of sound arriving at the microphone and the calculated phase difference for each frequency component A number calculator,
A phase difference component ratio calculation unit that calculates a ratio of the number of frequency components of the plurality of phase difference regions in the audio signal based on the calculated number of frequency components for each phase difference region;
A transition information calculation unit that calculates transition information of the frequency component number ratio based on a temporal change in the frequency component number ratio;
The sound source detection unit determines a phase difference region where the sound source exists based on the number of frequency components for each phase difference region, and also includes a phase difference where the sound source exists based on transition information of the ratio of the number of frequency components. Determine the number of sound sources in the region,
A sound source detection device characterized by that.

The sound source detection unit determines that the number of sound sources in the phase difference region has changed when the amount of change in the ratio of the number of frequency components in the phase difference region where the sound source exists exceeds a predetermined amount of change.
The sound source detection device according to claim 1.

The sound source detection unit determines that the number of sound sources in the phase difference region has increased when a ratio of the number of frequency components in the phase difference region in which the sound source exists is equal to or greater than a predetermined threshold.
The sound source detection device according to claim 1.

A stationary noise estimation unit for estimating an amplitude level of stationary noise included in the audio signal acquired from the microphone;
The phase difference calculating unit calculates a phase difference of a frequency component having an amplitude spectrum larger than an amplitude level of the stationary noise among the frequency components in the frequency spectrum of the audio signal;
The sound source detection device according to claim 1.

Based on the phase difference region where the sound source exists and the detection result of the number of sound sources, further comprising an abnormal state detection unit for detecting an abnormal state in the detection target space of the sound source,
The sound source detection device according to claim 1.

A sound source information holding unit for holding sound source information including information indicating a phase difference region where the sound source exists,
The abnormal state detection unit detects an abnormal state in the detection target space of the sound source based on the detection result of the phase difference region where the sound source exists and the number of sound sources, and the sound source information.
The sound source detection device according to claim 5.

Computer
A plurality of audio signals acquired from a plurality of microphones of the microphone array device are converted into a frequency spectrum to calculate a phase difference for each frequency component,
Based on the plurality of phase difference regions divided based on the direction of arrival of the sound arriving at the microphone and the calculated phase difference for each frequency component, the number of frequency components for each phase difference region is calculated,
Based on the calculated number of frequency components for each phase difference region, determine the phase difference region where the sound source exists,
Based on the number of frequency components for each phase difference region, calculate the ratio of the number of frequency components in the plurality of phase difference regions in the audio signal,
Determining the number of sound sources in the phase difference region where the sound source exists, based on the time change of the ratio of the number of frequency components,
A sound source detection method characterized by executing processing.

A plurality of audio signals acquired from a plurality of microphones of the microphone array device are converted into a frequency spectrum to calculate a phase difference for each frequency component,
Based on the plurality of phase difference regions divided based on the direction of arrival of the sound arriving at the microphone and the calculated phase difference for each frequency component, the number of frequency components for each phase difference region is calculated,
Based on the calculated number of frequency components for each phase difference region, determine the phase difference region where the sound source exists,
Based on the number of frequency components for each phase difference region, calculate the ratio of the number of frequency components in the plurality of phase difference regions in the audio signal,
Determining the number of sound sources in the phase difference region where the sound source exists, based on the time change of the ratio of the number of frequency components;
A program that causes a computer to execute processing.