JP2000099009A

JP2000099009A - Audio signal encoding method

Info

Publication number: JP2000099009A
Application number: JP10283453A
Authority: JP
Inventors: Toshio Motegi; 敏雄茂出木
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 1998-09-18
Filing date: 1998-09-18
Publication date: 2000-04-07
Anticipated expiration: 2018-09-18
Also published as: JP4037542B2

Abstract

(57)【要約】【課題】ＭＩＤＩデータのような非線形な符号データ
への変換を効率よく行う。【解決手段】符号化対象となる音響信号をＰＣＭコー
ド化し、音響データとして取り込み、時間軸上に複数の
単位区間ｄ１〜ｄ５を設定する（図(a) ）。各単位区間
ごとに、フーリエ変換を行い、スペクトルを求める（図
(b) ）。このとき、対数尺度の周波数軸ｆ上に等間隔に
１２８個のノートナンバー（０〜１２７）を定義し、こ
の１２８個のノートナンバーに対応する周波数について
のみ、実行強度Ｅを求めるようにする（図(c) ）。実効
強度の大きい順にＰ個のノートナンバーｎ（ｄ１，１）
〜ｎ（ｄ１，Ｐ）を抽出し、Ｐ個のトラック上の各単位
区間に対応する時間位置に配置する。各トラック上のノ
ートナンバーをＭＩＤＩデータで表現し、Ｐチャンネル
ステレオ音として、原音響信号を再生する。 (57) [Problem] To efficiently convert non-linear code data such as MIDI data. SOLUTION: An audio signal to be encoded is converted into PCM code, fetched as audio data, and a plurality of unit sections d1 to d5 are set on a time axis (FIG. 1 (a)). Fourier transform is performed for each unit section to obtain a spectrum (Figure
(b)). At this time, 128 note numbers (0 to 127) are defined at equal intervals on the frequency axis f of the logarithmic scale, and the execution strength E is obtained only for the frequency corresponding to the 128 note numbers ( Figure (c)). P note numbers n (d1,1) in order of increasing effective intensity
Ｎn (d1, P) are extracted and arranged at time positions corresponding to each unit section on the P tracks. The note number on each track is represented by MIDI data, and the original sound signal is reproduced as a P-channel stereo sound.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音響信号の符号化方
法に関し、時系列の強度信号として与えられる音響信号
を符号化し、これを復号化して再生する技術に関する。
特に、本発明は一般の音響信号を、ＭＩＤＩ形式の符号
データに効率良く変換する処理に適しており、音声を記
録する種々の産業分野への応用が期待される。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for encoding an audio signal, and more particularly to a technique for encoding an audio signal given as a time-series intensity signal, and decoding and reproducing the audio signal.
In particular, the present invention is suitable for a process of efficiently converting a general acoustic signal into MIDI-format code data, and is expected to be applied to various industrial fields for recording voice.

【０００２】[0002]

【従来の技術】音響信号を符号化する技術として、ＰＣ
Ｍ（Pulse Code Modulation ）の手法は最も普及してい
る手法であり、現在、オーディオＣＤやＤＡＴなどの記
録方式として広く利用されている。このＰＣＭの手法の
基本原理は、アナログ音響信号を所定のサンプリング周
波数でサンプリングし、各サンプリング時の信号強度を
量子化してデジタルデータとして表現する点にあり、サ
ンプリング周波数や量子化ビット数を高くすればするほ
ど、原音を忠実に再生することが可能になる。ただ、サ
ンプリング周波数や量子化ビット数を高くすればするほ
ど、必要な情報量も増えることになる。そこで、できる
だけ情報量を低減するための手法として、信号の変化差
分のみを符号化するＡＤＰＣＭ（Adaptive Differentia
l Pulse Code Modulation ）の手法も用いられている。2. Description of the Related Art As a technique for encoding an audio signal, a PC is used.
The M (Pulse Code Modulation) method is the most widespread method, and is currently widely used as a recording method for audio CDs and DATs. The basic principle of this PCM method is that an analog audio signal is sampled at a predetermined sampling frequency, and the signal strength at each sampling is quantized and represented as digital data. The more it is, the more faithful it is possible to reproduce the original sound. However, the higher the sampling frequency and the number of quantization bits, the larger the required information amount. Therefore, as a technique for reducing the amount of information as much as possible, an ADPCM (Adaptive Differentia) that encodes only a signal change difference is used.
l Pulse Code Modulation) is also used.

【０００３】一方、電子楽器による楽器音を符号化しよ
うという発想から生まれたＭＩＤＩ（Musical Instrume
nt Digital Interface）規格も、パーソナルコンピュー
タの普及とともに盛んに利用されるようになってきてい
る。このＭＩＤＩ規格による符号データ（以下、ＭＩＤ
Ｉデータという）は、基本的には、楽器のどの鍵盤キー
を、どの程度の強さで弾いたか、という楽器演奏の操作
を記述したデータであり、このＭＩＤＩデータ自身に
は、実際の音の波形は含まれていない。そのため、実際
の音を再生する場合には、楽器音の波形を記憶したＭＩ
ＤＩ音源が別途必要になる。しかしながら、上述したＰ
ＣＭの手法で音を記録する場合に比べて、情報量が極め
て少なくてすむという特徴を有し、その符号化効率の高
さが注目を集めている。このＭＩＤＩ規格による符号化
および復号化の技術は、現在、パーソナルコンピュータ
を用いて楽器演奏、楽器練習、作曲などを行うソフトウ
エアに広く採り入れられており、カラオケ、ゲームの効
果音といった分野でも広く利用されている。[0003] On the other hand, MIDI (Musical Instrume) was born from the idea of encoding musical instrument sounds by electronic musical instruments.
The Digital Interface (nt Digital Interface) standard has also been actively used with the spread of personal computers. Code data according to the MIDI standard (hereinafter, MID)
I data) is basically data describing an operation of playing a musical instrument, such as which keyboard key of the musical instrument was played and at what strength, and the MIDI data itself contains the actual sound. No waveform is included. Therefore, when reproducing the actual sound, the MI which stores the waveform of the musical instrument sound is used.
A DI sound source is required separately. However, the P
It has the feature that the amount of information is extremely small as compared with the case where sound is recorded by the CM method, and its high encoding efficiency has attracted attention. The encoding and decoding technology based on the MIDI standard is now widely used in software for playing musical instruments, practicing musical instruments, and composing music using a personal computer, and is also widely used in fields such as karaoke and game sound effects. Have been.

【０００４】[0004]

【発明が解決しようとする課題】上述したように、ＰＣ
Ｍの手法により音響信号を符号化する場合、十分な音質
を確保しようとすれば情報量が膨大になり、データ処理
の負担が重くならざるを得ない。したがって、通常は、
ある程度の情報量に抑えるため、ある程度の音質に妥協
せざるを得ない。もちろん、ＭＩＤＩ規格による符号化
の手法を採れば、非常に少ない情報量で十分な音質をも
った音の再生が可能であるが、上述したように、ＭＩＤ
Ｉ規格そのものが、もともと楽器演奏の操作を符号化す
るためのものであるため、広く一般音響への適用を行う
ことはできない。別言すれば、ＭＩＤＩデータを作成す
るためには、実際に楽器を演奏するか、あるいは、楽譜
の情報を用意する必要がある。As described above, the PC
In the case of encoding an audio signal by the method of M, the amount of information becomes enormous if sufficient sound quality is to be ensured, and the load of data processing must be increased. Therefore, usually
In order to keep the amount of information to a certain extent, we have to compromise on some sound quality. Of course, if the encoding method based on the MIDI standard is adopted, it is possible to reproduce a sound having a sufficient sound quality with a very small amount of information.
Since the I standard itself is originally for encoding the operation of musical instrument performance, it cannot be widely applied to general sound. In other words, in order to create MIDI data, it is necessary to actually play a musical instrument or prepare musical score information.

【０００５】このように、従来用いられているＰＣＭの
手法にしても、ＭＩＤＩの手法にしても、それぞれ音響
信号の符号化方法としては一長一短があり、一般の音響
について、少ない情報量で十分な音質を確保することは
できない。ところが、一般の音響についても効率的な符
号化を行いたいという要望は、益々強くなってきてい
る。いわゆるヴォーカル音響と呼ばれる人間の話声や歌
声を取り扱う分野では、かねてからこのような要望が強
く出されている。たとえば、語学教育、声楽教育、犯罪
捜査などの分野では、ヴォーカル音響信号を効率的に符
号化する技術が切望されている。このような要求に応え
るために、特願平９−２７３９４９号明細書には、ＭＩ
ＤＩデータを利用することが可能な新規な符号化方法が
提案されている。この方法では、音響信号の時間軸に沿
って複数の単位区間を設定し、各単位区間ごとにフーリ
エ変換を行ってスペクトルを求め、このスペクトルに応
じたＭＩＤＩデータを作成するという手順が実行され
る。しかしながら、ＭＩＤＩデータはもともと音符に相
当するデータであり、周波数に関しては非線形な特性を
有する。これに対して、従来の一般的なフーリエ変換の
手法は、線形な周波数軸を用いたスペクトルを得ること
を前提としている。このため、従来の一般的なフーリエ
変換の手法を用いた場合、ＭＩＤＩデータなどの非線形
な符号データへの変換を効率良く行うことができないと
いう問題があった。[0005] As described above, there are advantages and disadvantages in the encoding method of the audio signal in both the conventional PCM method and the MIDI method, and a small amount of information is sufficient for general audio. Sound quality cannot be ensured. However, there is an increasing demand for efficient encoding of general audio. In the field of dealing with human voices and singing voices, so-called vocal sound, such a request has been strongly issued for some time. For example, in fields such as language education, vocal education, and criminal investigation, there is a strong need for a technology for efficiently encoding vocal acoustic signals. To meet such demands, Japanese Patent Application No. 9-273949 discloses MI.
A new encoding method that can use DI data has been proposed. In this method, a procedure is performed in which a plurality of unit sections are set along the time axis of an acoustic signal, a spectrum is obtained by performing a Fourier transform for each unit section, and MIDI data corresponding to the spectrum is created. . However, MIDI data is originally data corresponding to musical notes, and has non-linear characteristics with respect to frequency. On the other hand, the conventional general Fourier transform technique is based on obtaining a spectrum using a linear frequency axis. For this reason, when the conventional general Fourier transform technique is used, there has been a problem that conversion to nonlinear code data such as MIDI data cannot be performed efficiently.

【０００６】そこで本発明は、ＭＩＤＩデータのような
非線形な符号データへの変換を効率よく行うことが可能
な音響信号の符号化方法を提供することを目的とする。Accordingly, an object of the present invention is to provide an audio signal encoding method capable of efficiently performing conversion into nonlinear encoded data such as MIDI data.

【０００７】[0007]

【課題を解決するための手段】(1) 本発明の第１の態
様は、時系列の強度信号として与えられる音響信号を符
号化するための音響信号の符号化方法において、符号化
対象となる音響信号の時間軸上に複数の単位区間を設定
する区間設定段階と、周波数軸上に複数Ｍ個の測定ポイ
ントを離散的に定義するとともに、このＭ個の測定ポイ
ントにそれぞれ対応させて合計Ｍ個の符号コードを定義
する符号定義段階と、個々の単位区間ごとに、当該単位
区間内の音響信号に含まれるＭ個の測定ポイントに相当
する周波数成分のスペクトル強度を求める強度演算段階
と、この強度演算段階において求めたスペクトル強度に
基いて、個々の単位区間ごとに、Ｍ個の全符号コードの
中から当該単位区間を代表するＰ個の代表符号コードを
抽出し、これら抽出した代表符号コードおよびそのスペ
クトル強度によって、個々の単位区間の音響信号を表現
する符号化段階と、を行うようにしたものである。(1) A first aspect of the present invention is an audio signal encoding method for encoding an audio signal given as a time-series intensity signal, which is to be encoded. A section setting step of setting a plurality of unit sections on the time axis of the acoustic signal, a plurality of M measurement points are discretely defined on the frequency axis, and a total of M measurement points are respectively associated with the M measurement points. A code definition step of defining the number of code codes; and an intensity calculation step of obtaining, for each unit section, a spectrum intensity of a frequency component corresponding to M measurement points included in an audio signal in the unit section. Based on the spectrum intensity obtained in the intensity calculation stage, for each individual unit section, P representative code codes representing the unit section are extracted from the M total code codes, and these extracted codes are extracted. And an encoding step of expressing an audio signal of each unit section by using the representative code and the spectrum intensity thereof.

【０００８】(2) 本発明の第２の態様は、上述の第１
の態様に係る音響信号の符号化方法において、符号定義
段階で、対数尺度の周波数軸上で等間隔となるように複
数Ｍ個の測定ポイントを離散的に定義するようにしたも
のである。(2) A second aspect of the present invention is the above-mentioned first aspect.
In the method for encoding an acoustic signal according to the aspect, in the code definition step, a plurality of M measurement points are discretely defined so as to be equally spaced on a frequency axis of a logarithmic scale.

【０００９】(3) 本発明の第３の態様は、上述の第２
の態様に係る音響信号の符号化方法において、符号定義
段階で、複数Ｍ個の符号コードとしてＭＩＤＩデータで
利用されるノートナンバーを用い、符号化段階で、個々
の単位区間の音響信号を、代表符号コードとして抽出さ
れたノートナンバーと、そのスペクトル強度に基いて決
定されたベロシティーと、当該単位区間の長さに基いて
決定されたデルタタイムと、を示すデータからなるＭＩ
ＤＩ形式の符号データによって表現するようにしたもの
である。(3) The third aspect of the present invention is the above-mentioned second aspect.
In the audio signal encoding method according to the aspect, in the code definition step, note numbers used in MIDI data are used as a plurality of M code codes, and in the encoding step, an audio signal of each unit section is represented by a representative number. An MI consisting of data indicating a note number extracted as a code code, a velocity determined based on its spectrum intensity, and a delta time determined based on the length of the unit section.
It is represented by code data in DI format.

【００１０】(4) 本発明の第４の態様は、上述の第１
〜第３の態様に係る音響信号の符号化方法において、強
度演算段階で、周波数ｆ（ｍ）に相当する第ｍ番目の測
定ポイントにおけるスペクトル強度Ｓ（ｍ）を演算する
際に、各測定ポイントのそれぞれに相当する周波数をも
ったＭ個の正弦関数および余弦関数との相関を求める演
算を行うようにしたものである。(4) The fourth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the third to third aspects, when calculating the spectrum intensity S (m) at the m-th measurement point corresponding to the frequency f (m) in the intensity calculation stage, Are calculated to find correlations with M sine functions and cosine functions having frequencies corresponding to.

【００１１】(5) 本発明の第５の態様は、上述の第１
〜第４の態様に係る音響信号の符号化方法において、強
度演算段階で、単位区間の区間長にわたる重みづけを定
義した重み関数を用意し、単位区間内の音響信号にこの
重み関数を乗じることによりスペクトル強度を求めるよ
うにしたものである。(5) A fifth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the fourth to fourth aspects, a weight function defining weighting over the section length of the unit section is prepared in the intensity calculation step, and the audio signal in the unit section is multiplied by the weight function. Is used to determine the spectrum intensity.

【００１２】(6) 本発明の第６の態様は、上述の第１
〜第５の態様に係る音響信号の符号化方法において、区
間設定段階で、隣接する単位区間が時間軸上で部分的に
重複するような設定を行うようにしたものである。(6) The sixth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the fifth to fifth aspects, in the section setting step, setting is performed such that adjacent unit sections partially overlap on the time axis.

【００１３】(7) 本発明の第７の態様は、上述の第１
〜第６の態様に係る音響信号の符号化方法において、符
号化対象となる音響信号を所定のサンプリング周波数Ｆ
でサンプリングし、第ｘ番目のサンプルの振幅値をＡ
（ｘ）とする音響データとして取り込み、この取り込ん
だ音響データに対して各単位区間を設定するようにし、
強度演算段階で、第ｈ番目のサンプルからはじまり合計
Ｋ個のサンプルを含む単位区間について、周波数ｆ
（ｍ）に相当する第ｍ番目の測定ポイントにおけるスペ
クトル強度Ｓ（ｍ）を演算する際に、所定の重み関数Ｗ
（ｋ）を用いて、Ｓ（ｍ）＝（１／Ｋ）・Σ _{ｋ＝０〜（Ｋ−１）} （Ｗ
（ｋ）・Ａ（ｈ＋ｋ）・ｅｘｐ（−ｊ２πｆ（ｍ）・
（ｈ＋ｋ）／Ｆ））なる式を用いるようにしたものである。(7) The seventh aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to any one of the first to sixth aspects, the audio signal to be encoded is set at a predetermined sampling frequency F
And the amplitude value of the x-th sample is represented by A
(X) is taken as sound data, and each unit section is set for the taken sound data.
In the intensity calculation step, for a unit section including a total of K samples starting from the h-th sample, the frequency f
When calculating the spectrum intensity S (m) at the m-th measurement point corresponding to (m), a predetermined weight function W
Using (k), S (m) = (1 / K) · Σ _{k = 0 to (K−1)} (W
(K) · A (h + k) · exp (−j2πf (m) ·
(H + k) / F)).

【００１４】(8) 本発明の第８の態様は、上述の第１
〜第６の態様に係る音響信号の符号化方法において、符
号化対象となる音響信号を所定のサンプリング周波数Ｆ
でサンプリングし、第ｘ番目のサンプルの振幅値をＡ
（ｘ）とする音響データとして取り込み、この取り込ん
だ音響データに対して各単位区間を設定するようにし、
強度演算段階で、第ｈ番目のサンプルからはじまり合計
Ｋ個のサンプルを含む単位区間について、周波数ｆ
（ｍ）に相当する第ｍ番目の測定ポイントにおけるスペ
クトル強度Ｓ（ｍ）を演算する際に、所定の重み関数Ｗ
（ｋ）を用いて、Ｓ（ｍ）＝（１／Ｋ）・Σ _{ｋ＝０〜（Ｋ−１）} （Ｗ
（ｋ）・Ａ（ｈ＋ｋ）・ｅｘｐ（−ｊ２πｆ（ｍ）・ｋ
／Ｆ））なる式を用いるようにしたものである。(8) An eighth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to any one of the first to sixth aspects, the audio signal to be encoded is set at a predetermined sampling frequency F
And the amplitude value of the x-th sample is represented by A
(X) is taken as sound data, and each unit section is set for the taken sound data.
In the intensity calculation step, for a unit section including a total of K samples starting from the h-th sample, the frequency f
When calculating the spectrum intensity S (m) at the m-th measurement point corresponding to (m), a predetermined weight function W
Using (k), S (m) = (1 / K) · Σ _{k = 0 to (K−1)} (W
(K) · A (h + k) · exp (−j2πf (m) · k
/ F)) is used.

【００１５】(9) 本発明の第９の態様は、上述の第１
〜第８の態様に係る音響信号の符号化方法において、符
号化段階で、各単位区間について抽出された複数Ｐ個の
代表符号コードを複数のトラックに分配して配置し、同
一トラック上に隣接して配置された代表符号コードが所
定の類似条件を満足する場合には、この隣接配置された
代表符号コードを単一の代表符号コードに統合する処理
を行うようにしたものである。(9) The ninth aspect of the present invention is the above-mentioned first aspect.
In the audio signal encoding method according to the eighth to eighth aspects, in the encoding step, a plurality of P representative code codes extracted for each unit section are distributed and arranged on a plurality of tracks, and are adjacently arranged on the same track. When the representative code arranged in such a manner satisfies a predetermined similarity condition, a process of integrating the representative code arranged adjacent to a single representative code is performed.

【００１６】(10) 本発明の第１０の態様は、上述の第
９の態様に係る音響信号の符号化方法において、複数Ｐ
個の代表符号コードを複数のトラックに分配して配置す
る際に、同一トラック上に隣接して配置される代表符号
コードが類似条件を満足する確率が高くなるように、分
配の順序を調整するようにしたものである。(10) A tenth aspect of the present invention provides the audio signal encoding method according to the ninth aspect, wherein
When distributing and arranging the representative code codes on a plurality of tracks, the order of distribution is adjusted so that the probability that the representative code codes arranged adjacently on the same track satisfy the similar condition is increased. It is like that.

【００１７】(11) 本発明の第１１の態様は、上述の第
１〜第１０の態様に係る音響信号の符号化方法を実行す
るためのプログラムを、コンピュータ読み取り可能な記
録媒体に記録するようにしたものである。(11) According to an eleventh aspect of the present invention, a program for executing the audio signal encoding method according to the first to tenth aspects is recorded on a computer-readable recording medium. It was made.

【００１８】[0018]

【発明の実施の形態】以下、本発明を図示する実施形態
に基づいて説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below based on an embodiment shown in the drawings.

【００１９】§１．本発明に係る音響信号の符号化方
法の基本原理はじめに、本発明に係る音響信号の符号化方法の基本原
理を図１を参照しながら説明する。いま、図１(a) に示
すように、時系列の強度信号としてアナログ音響信号が
与えられたものとしよう。図示の例では、横軸に時間
ｔ、縦軸に振幅（強度）をとってこの音響信号を示して
いる。ここでは、まずこのアナログ音響信号を、デジタ
ルの音響データとして取り込む処理を行う。これは、従
来の一般的なＰＣＭの手法を用い、所定のサンプリング
周期でこのアナログ音響信号をサンプリングし、振幅を
所定の量子化ビット数を用いてデジタルデータに変換す
る処理を行えばよい。ここでは、説明の便宜上、ＰＣＭ
の手法でデジタル化した音響データの波形も、図１(a)
のアナログ音響信号と同一の波形で示すことにする。 §1. Audio signal encoding method according to the present invention
The basic principle beginning of law, the basic principle of the method of encoding an acoustic signal according to the present invention with reference to FIG. 1 will be described. Now, suppose that an analog sound signal is given as a time-series intensity signal as shown in FIG. In the illustrated example, the horizontal axis represents time t, and the vertical axis represents amplitude (intensity), and this acoustic signal is shown. Here, first, a process of capturing the analog audio signal as digital audio data is performed. This can be done by using a conventional general PCM technique, sampling the analog audio signal at a predetermined sampling period, and converting the amplitude into digital data using a predetermined quantization bit number. Here, for convenience of explanation, PCM
The waveform of the sound data digitized by the method of
Will be shown with the same waveform as the analog audio signal of FIG.

【００２０】続いて、この符号化対象となる音響信号の
時間軸上に、複数の単位区間を設定する。図１(a) に示
す例では、時間軸ｔ上に等間隔に６つの時刻ｔ１〜ｔ６
が定義され、これら各時刻を始点および終点とする５つ
の単位区間ｄ１〜ｄ５が設定されている（より実用的な
区間設定方法については後述する）。Subsequently, a plurality of unit sections are set on the time axis of the audio signal to be encoded. In the example shown in FIG. 1A, six times t1 to t6 are equally spaced on the time axis t.
Are defined, and five unit sections d1 to d5 having these times as a start point and an end point are set (a more practical section setting method will be described later).

【００２１】こうして単位区間が設定されたら、各単位
区間ごとの音響信号に対してそれぞれフーリエ変換を行
い、スペクトルを作成する（実際には、§３で述べるよ
うに、一般のフーリエ変換とは異なる手法を採る）。こ
のとき、ハニング窓（Hanning Window )などの重み関数
で、切り出した音響信号にフィルタをかけてフーリエ変
換を施すことが望ましい。一般にフーリエ変換は、切り
出した区間前後に同様な信号が無限に存在することが想
定されているため、重み関数を用いない場合、作成した
スペクトルに高周波ノイズがのることが多い。このよう
な場合、ハニング窓関数など区間の両端の重みが０にな
るような重み関数を用いるのが望ましい。ハニング窓関
数Ｈ（ｋ）は、単位区間長をＬとすると、ｋ＝１…Ｌに
対して、Ｈ（ｋ）＝０．５−０．５＊ｃｏｓ（２πｋ／Ｌ）で与えられる関数である。When the unit section is set in this way, a Fourier transform is performed on the acoustic signal of each unit section to create a spectrum (actually, as described in §3, this is different from a general Fourier transform) Method). At this time, it is desirable to apply a filter to the cut-out sound signal using a weighting function such as a Hanning Window or the like to perform a Fourier transform. In general, the Fourier transform is assumed to have an infinite number of similar signals before and after the cut-out section. Therefore, when a weighting function is not used, high frequency noise often appears on a created spectrum. In such a case, it is desirable to use a weighting function such as a Hanning window function in which the weights at both ends of the section become 0. The Hanning window function H (k) is a function given by H (k) = 0.5−0.5 * cos (2πk / L) for k = 1... is there.

【００２２】図１(b) には、単位区間ｄ１について作成
されたスペクトルの一例が示されている。このスペクト
ルでは、横軸上に定義された周波数ｆによって、単位区
間ｄ１内の音響信号に含まれる周波数成分（０〜Ｆ：こ
こでＦはサンプリング周波数）が示されており、縦軸上
に定義された複素強度Ａによって、各周波数成分ごとの
複素強度が示されている。FIG. 1B shows an example of a spectrum created for the unit section d1. In this spectrum, the frequency components (0 to F: where F is a sampling frequency) included in the acoustic signal in the unit section d1 are indicated by the frequency f defined on the horizontal axis, and defined on the vertical axis. The complex intensity A indicates the complex intensity for each frequency component.

【００２３】次に、このスペクトルの周波数軸ｆに対応
させて、離散的に複数Ｍ個の符号コードを定義する。こ
の例では、符号コードとしてＭＩＤＩデータで利用され
るノートナンバーｎを用いており、ｎ＝０〜１２７まで
の１２８個の符号コードを定義している。ノートナンバ
ーｎは、音符の音階を示すパラメータであり、たとえ
ば、ノートナンバーｎ＝６９は、ピアノの鍵盤中央の
「ラ音（Ａ３音）」を示しており、４４０Ｈｚの音に相
当する。このように、１２８個のノートナンバーには、
いずれも所定の周波数が対応づけられるので、スペクト
ルの周波数軸ｆ上の所定位置に、それぞれ１２８個のノ
ートナンバーｎが離散的に定義されることになる。Next, a plurality of M code codes are discretely defined corresponding to the frequency axis f of the spectrum. In this example, note numbers n used in MIDI data are used as code codes, and 128 code codes from n = 0 to 127 are defined. The note number n is a parameter indicating the musical scale of the note. For example, the note number n = 69 indicates the “ra tone (A3 tone)” at the center of the keyboard of the piano, and corresponds to a sound of 440 Hz. Thus, for 128 note numbers,
In each case, since a predetermined frequency is associated, 128 note numbers n are discretely defined at predetermined positions on the frequency axis f of the spectrum.

【００２４】ここで、ノートナンバーｎは、１オクター
ブ上がると、周波数が２倍になる対数尺度の音階を示す
ため、周波数軸ｆに対して線形には対応しない。そこ
で、ここでは周波数軸ｆを対数尺度で表し、この対数尺
度軸上にノートナンバーｎを定義した強度グラフを作成
してみる。図１(c) は、このようにして作成された単位
区間ｄ１についての強度グラフを示す。この強度グラフ
の横軸は、図１(b) に示すスペクトログラムの横軸を対
数尺度に変換したものであり、ノートナンバーｎ＝０〜
１２７が等間隔にプロットされている。一方、この強度
グラフの縦軸は、図１(b) に示すスペクトルの複素強度
Ａを実効強度Ｅに変換したものであり、各ノートナンバ
ーｎの位置における強度を示している。一般に、フーリ
エ変換によって得られる複素強度Ａは、実数部Ｒと虚数
部Ｉとによって表されるが、実効強度Ｅは、Ｅ＝（Ｒ^２
＋Ｉ^２）^１／２なる演算によって求めることができる。Here, note number n indicates a logarithmic scale at which the frequency doubles when the octave is increased by one octave, and thus does not correspond linearly to frequency axis f. Therefore, here, the frequency axis f is represented by a logarithmic scale, and an intensity graph in which a note number n is defined on the logarithmic scale axis is created. FIG. 1C shows an intensity graph for the unit section d1 created in this way. The horizontal axis of this intensity graph is obtained by converting the horizontal axis of the spectrogram shown in FIG. 1 (b) into a logarithmic scale, and note numbers n = 0 to 0.
127 are plotted at equal intervals. On the other hand, the vertical axis of this intensity graph is obtained by converting the complex intensity A of the spectrum shown in FIG. 1B into the effective intensity E, and indicates the intensity at the position of each note number n. In general, the complex intensity A obtained by Fourier transform is represented by a real part R and an imaginary part I, but the effective intensity E is expressed as E = (R ²
+ I ² ) ^1/2 .

【００２５】こうして求められた単位区間ｄ１の強度グ
ラフは、単位区間ｄ１の音響信号に含まれる振動成分に
ついて、ノートナンバーｎ＝０〜１２７に相当する各振
動成分の割合を実効強度として示すグラフということが
できる。そこで、この強度グラフに示されている各実効
強度に基いて、全Ｍ個（この例ではＭ＝１２８）のノー
トナンバーの中からＰ個のノートナンバーを選択し、こ
のＰ個のノートナンバーｎを、単位区間ｄ１を代表する
代表符号コードとして抽出する。ここでは、説明の便宜
上、Ｐ＝３として、全１２８個の候補の中から３個のノ
ートナンバーを代表符号コードとして抽出する場合を示
すことにする。たとえば、「候補の中から強度の大きい
順にＰ個の符号コードを抽出する」という基準に基いて
抽出を行えば、図１(c) に示す例では、第１番目の代表
符号コードとしてノートナンバーｎ（ｄ１，１）が、第
２番目の代表符号コードとしてノートナンバーｎ（ｄ
１，２）が、第３番目の代表符号コードとしてノートナ
ンバーｎ（ｄ１，３）が、それぞれ抽出されることにな
る。The thus obtained intensity graph of the unit section d1 is a graph showing, as an effective intensity, a ratio of each vibration component corresponding to the note number n = 0 to 127 with respect to the vibration component included in the acoustic signal of the unit section d1. be able to. Therefore, based on each effective intensity shown in this intensity graph, P note numbers are selected from all M (M = 128 in this example) note numbers, and the P note numbers n are selected. Is extracted as a representative code representing the unit section d1. Here, for convenience of explanation, it is assumed that P = 3 and three note numbers are extracted as representative code codes from a total of 128 candidates. For example, if the extraction is performed based on the criterion of “extracting P code codes from candidates in descending order of strength”, in the example shown in FIG. 1C, note number is used as the first representative code code. n (d1,1) is the note number n (d
1, 2) are extracted as note number n (d1, 3) as the third representative code.

【００２６】このようにして、Ｐ個の代表符号コードが
抽出されたら、これらの代表符号コードとその実効強度
によって、単位区間ｄ１の音響信号を表現することがで
きる。たとえば、上述の例の場合、図１(c) に示す強度
グラフにおいて、ノートナンバーｎ（ｄ１，１）、ｎ
（ｄ１，２）、ｎ（ｄ１，３）の実効強度がそれぞれｅ
（ｄ１，１）、ｅ（ｄ１，２）、ｅ（ｄ１，３）であっ
たとすれば、以下に示す３組のデータ対によって、単位
区間ｄ１の音響信号を表現することができる。When the P representative code codes have been extracted in this way, the sound signal of the unit section d1 can be represented by these representative code codes and their effective intensities. For example, in the case of the above example, note numbers n (d1, 1) and n in the intensity graph shown in FIG.
(D1, 2) and n (d1, 3) have an effective intensity of e
If (d1,1), e (d1,2), and e (d1,3), the sound signal of the unit section d1 can be represented by the following three data pairs.

【００２７】ｎ（ｄ１，１），ｅ（ｄ１，１）ｎ（ｄ１，２），ｅ（ｄ１，２）ｎ（ｄ１，３），ｅ（ｄ１，３）以上、単位区間ｄ１についての処理について説明した
が、単位区間ｄ２〜ｄ５についても、それぞれ別個に同
様の処理が行われ、代表符号コードおよびその強度を示
すデータが得られることになる。たとえば、単位区間ｄ
２については、ｎ（ｄ２，１），ｅ（ｄ２，１）ｎ（ｄ２，２），ｅ（ｄ２，２）ｎ（ｄ２，３），ｅ（ｄ２，３）なる３組のデータ対が得られる。このようにして各単位
区間ごとに得られたデータによって、原音響信号を符号
化することができる。N (d1,1), e (d1,1) n (d1,2), e (d1,2) n (d1,3), e (d1,3) Processing for unit section d1 However, the same processing is performed separately for each of the unit sections d2 to d5, and a representative code and data indicating its strength are obtained. For example, unit section d
For two, three data pairs of n (d2,1), e (d2,1) n (d2,2), e (d2,2) n (d2,3), e (d2,3) can get. The original audio signal can be encoded by the data obtained for each unit section in this manner.

【００２８】図２は、上述の方法による符号化の概念図
である。図２(a) には、図１(a) と同様に、原音響信号
について５つの単位区間ｄ１〜ｄ５を設定した状態が示
されており、図２(b) には、各単位区間ごとに得られた
符号データが音符の形式で示されている。この例では、
個々の単位区間ごとに３個の代表符号コードを抽出して
おり（Ｐ＝３）、これら代表符号コードに関するデータ
を３つのトラックＴ１〜Ｔ３に分けて収容するようにし
ている。たとえば、単位区間ｄ１について抽出された代
表符号コードｎ（ｄ１，１），ｎ（ｄ１，２），ｎ（ｄ
１，３）は、それぞれトラックＴ１，Ｔ２，Ｔ３に収容
されている。もっとも、図２(b) は、本発明によって得
られる符号データを音符の形式で示した概念図であり、
実際には、各音符にはそれぞれ強度に関するデータが付
加されている。たとえば、トラックＴ１には、ノートナ
ンバーｎ（ｄ１，１），ｎ（ｄ２，１），ｎ（ｄ３，
１）…なる音階を示すデータとともに、ｅ（ｄ１，
１），ｅ（ｄ２，１），ｅ（ｄ３，１）…なる強度を示
すデータが収容されることになる。FIG. 2 is a conceptual diagram of encoding by the above method. FIG. 2 (a) shows a state in which five unit sections d1 to d5 are set for the original sound signal, as in FIG. 1 (a), and FIG. Are shown in the form of musical notes. In this example,
Three representative code codes are extracted for each unit section (P = 3), and data relating to these representative code codes is stored in three tracks T1 to T3. For example, the representative code codes n (d1,1), n (d1,2), n (d) extracted for the unit section d1
1, 3) are accommodated in tracks T1, T2, T3, respectively. However, FIG. 2B is a conceptual diagram showing the code data obtained by the present invention in the form of musical notes.
In practice, data relating to the intensity is added to each note. For example, the track T1 has note numbers n (d1,1), n (d2,1), n (d3,
1) e (d1,
1), e (d2, 1), e (d3, 1)... Are stored.

【００２９】本発明における符号化の形式としては、必
ずしもＭＩＤＩ形式を採用する必要はないが、この種の
符号化形式としてはＭＩＤＩ形式が最も普及しているた
め、実用上はＭＩＤＩ形式の符号データを用いるのが最
も好ましい。ＭＩＤＩ形式では、「ノートオン」データ
もしくは「ノートオフ」データが、「デルタタイム」デ
ータを介在させながら存在する。「ノートオン」データ
は、特定のノートナンバーＮとベロシティーＶとを指定
して特定の音の演奏開始を指示するデータであり、「ノ
ートオフ」データは、特定のノートナンバーＮとベロシ
ティーＶとを指定して特定の音の演奏終了を指示するデ
ータである。また、「デルタタイム」データは、所定の
時間間隔を示すデータである。ベロシティーＶは、たと
えば、ピアノの鍵盤などを押し下げる速度（ノートオン
時のベロシティー）および鍵盤から指を離す速度（ノー
トオフ時のベロシティー）を示すパラメータであり、特
定の音の演奏開始操作もしくは演奏終了操作の強さを示
すことになる。It is not always necessary to adopt the MIDI format as the encoding format in the present invention. However, since the MIDI format is the most widely used as this type of encoding format, the MIDI format code data is practically used. Is most preferably used. In the MIDI format, "note on" data or "note off" data exists with "delta time" data interposed. The “note-on” data is data that designates a specific note number N and velocity V to instruct the start of performance of a specific sound, and the “note-off” data is a specific note number N and velocity V. Is the data for instructing the end of the performance of a specific sound by designating. The “delta time” data is data indicating a predetermined time interval. Velocity V is a parameter indicating, for example, the speed at which a piano keyboard or the like is depressed (velocity at the time of note-on) and the speed at which a finger is released from the keyboard (velocity at the time of note-off). Or it indicates the strength of the performance end operation.

【００３０】本実施形態では、上述したように、第ｉ番
目の単位区間ｄｉについて、代表符号コードとしてＰ個
のノートナンバーｎ（ｄｉ，１），ｎ（ｄｉ，２），
…，ｎ（ｄｉ，Ｐ）が得られ、このそれぞれについて実
効強度ｅ（ｄｉ，１），ｅ（ｄｉ，２），…，ｅ（ｄ
ｉ，Ｐ）が得られる。そこで本実施形態では、次のよう
な手法により、ＭＩＤＩ形式の符号データを作成してい
る。まず、「ノートオン」データもしくは「ノートオ
フ」データの中で記述するノートナンバーＮとしては、
得られたノートナンバーｎ（ｄｉ，１），ｎ（ｄｉ，
２），…，ｎ（ｄｉ，Ｐ）をそのまま用いている。一
方、「ノートオン」データもしくは「ノートオフ」デー
タの中で記述するベロシティーＶとしては、得られた実
効強度ｅ（ｄｉ，１），ｅ（ｄｉ，２），…，ｅ（ｄ
ｉ，Ｐ）を、値が０〜１の範囲となるように規格化し、
この規格化後の実効強度Ｅの平方根に１２７を乗じた値
を用いている。すなわち、実効強度Ｅについての最大値
をＥmax とした場合、Ｖ＝（Ｅ／Ｅmax ）^１／２・１２７なる演算で求まる値Ｖをベロシティーとして用いてい
る。あるいは対数をとって、Ｖ＝ｌｏｇ（Ｅ／Ｅmax ）・１２７＋１２７（ただし、Ｖ＜０の場合はＶ＝０とする）なる演算で求まる値Ｖをベロシティーとして用いてもよ
い。また、「デルタタイム」データは、各単位区間の長
さに応じて設定すればよい。In this embodiment, as described above, for the i-th unit section di, P note numbers n (di, 1), n (di, 2),
, N (di, P) are obtained, and the effective intensities e (di, 1), e (di, 2),.
i, P) are obtained. Therefore, in the present embodiment, the code data in the MIDI format is created by the following method. First, note number N described in "note-on" data or "note-off" data is:
The obtained note numbers n (di, 1), n (di,
2),..., N (di, P) are used as they are. On the other hand, as the velocity V described in the “note-on” data or the “note-off” data, the obtained effective intensities e (di, 1), e (di, 2),.
i, P) is normalized such that the value is in the range of 0 to 1,
The value obtained by multiplying the square root of the normalized effective intensity E by 127 is used. That is, assuming that the maximum value of the effective intensity E is Emax, the value V obtained by the calculation of V = (E / Emax) ^1/2 · 127 is used as the velocity. Alternatively, the value V obtained by the calculation of V = log (E / Emax) · 127 + 127 (however, when V <0, V = 0) may be used as the velocity. The “delta time” data may be set according to the length of each unit section.

【００３１】結局、上述した実施形態では、３トラック
からなるＭＩＤＩ符号データが得られることになる。こ
のＭＩＤＩ符号データを３台のＭＩＤＩ音源を用いて再
生すれば、６チャンネルのステレオ再生音として音響信
号が再生される。After all, in the above-described embodiment, MIDI code data composed of three tracks is obtained. If this MIDI coded data is reproduced using three MIDI sound sources, an acoustic signal is reproduced as six-channel stereo reproduced sound.

【００３２】上述した手順による符号化処理は、実際に
はコンピュータを用いて実行される。本発明による符号
化処理を実現するためのプログラムは、磁気ディスクや
光ディスクなどのコンピュータ読み取り可能な記録媒体
に記録して供給することができ、また、本発明による符
号化処理によって符号化された符号データは、同様に、
磁気ディスクや光ディスクなどのコンピュータ読み取り
可能な記録媒体に記録して供給することができる。The encoding process according to the above-described procedure is actually executed using a computer. A program for implementing the encoding process according to the present invention can be supplied by being recorded on a computer-readable recording medium such as a magnetic disk or an optical disk. The data is likewise
The information can be supplied by being recorded on a computer-readable recording medium such as a magnetic disk or an optical disk.

【００３３】§２．より実用的な区間設定方法これまで、本発明に係る音響信号の符号化方法の基本原
理を述べたが、以下、より実用的な符号化方法を述べ
る。ここでは、区間設定を行う上でのより実用的な手法
を説明する。図２(a) に示された例では、時間軸ｔ上に
等間隔に定義された６つの時刻ｔ１〜ｔ６を境界とし
て、５つの単位区間ｄ１〜ｄ５が設定されている。この
ような区間設定に基いて符号化を行った場合、再生時
に、境界となる時刻において音の不連続が発生しやす
い。したがって、実用上は、隣接する単位区間が時間軸
上で部分的に重複するような区間設定を行うのが好まし
い。 §2. More practical section setting method So far, the basic principle of the audio signal encoding method according to the present invention has been described. Hereinafter, a more practical encoding method will be described. Here, a more practical method for setting the section will be described. In the example shown in FIG. 2A, five unit sections d1 to d5 are set on the time axis t with six times t1 to t6 defined at equal intervals as boundaries. When encoding is performed based on such a section setting, discontinuity of sound is likely to occur at a boundary time during reproduction. Therefore, in practice, it is preferable to set a section in which adjacent unit sections partially overlap on the time axis.

【００３４】図３(a) は、このように部分的に重複する
区間設定を行った例である。図示されている単位区間ｄ
１〜ｄ４は、いずれも部分的に重なっており、このよう
な区間設定に基いて前述の処理を行うと、図３(b) の概
念図に示されているような符号化が行われることにな
る。この例では、それぞれの単位区間の中心を基準位置
として、各音符をそれぞれの基準位置に配置している
が、単位区間に対する相対的な基準位置は、必ずしも中
心に設定する必要はない。図３(b) に示す概念図を図２
(b) に示す概念図と比較すると、音符の密度が高まって
いることがわかる。このように重複した区間設定を行う
と、作成される符号データの数は増加することになる
が、再生時に音の不連続が生じない自然な符号化が可能
になる。FIG. 3A shows an example in which a partially overlapping section is set as described above. The unit section d shown
1 to d4 partially overlap each other, and if the above-described processing is performed based on such section setting, encoding as shown in the conceptual diagram of FIG. become. In this example, each note is arranged at each reference position with the center of each unit section as a reference position, but the reference position relative to the unit section does not necessarily need to be set at the center. The conceptual diagram shown in FIG.
Compared with the conceptual diagram shown in (b), it can be seen that the density of notes has increased. When such overlapping sections are set, the number of code data to be created increases, but natural coding that does not cause sound discontinuity during reproduction can be performed.

【００３５】図４は、時間軸上で部分的に重複する区間
設定を行う具体的な手法を示す図である。この具体例で
は、音響信号を２２ｋＨｚのサンプリング周波数でサン
プリングすることによりデジタル音響データとして取り
込み、個々の単位区間の区間長Ｌを１０２４サンプル分
（約４７ｍｓｅｃ）に設定し、各単位区間ごとのずれ量
を示すオフセット長ΔＬを２０サンプル分（約０．９ｍ
ｓｅｃ）に設定したものである。すなわち、任意のｉに
対して、第ｉ番目の単位区間の始点と第（ｉ＋１）番目
の単位区間の始点との時間軸上での隔たりがオフセット
長ΔＬに設定されることになる。たとえば、第１番目の
単位区間ｄ１は、１〜１０２４番目のサンプルを含んで
おり、第２番目の単位区間ｄ２は、２０サンプル分ずれ
た２１〜１０４４番目のサンプルを含んでいることにな
る。FIG. 4 is a diagram showing a specific method for setting sections that partially overlap on the time axis. In this specific example, an audio signal is sampled at a sampling frequency of 22 kHz, taken in as digital audio data, the section length L of each unit section is set to 1024 samples (about 47 msec), and the shift amount for each unit section is set. Is set to 20 samples (about 0.9 m).
sec). That is, for any i, the offset on the time axis between the start point of the i-th unit section and the start point of the (i + 1) -th unit section is set as the offset length ΔL. For example, the first unit section d1 includes the 1st to 1024th samples, and the second unit section d2 includes the 21st to 1044th samples shifted by 20 samples.

【００３６】このように、時間軸上で部分的に重複する
区間設定を行った場合、隣接する単位区間においてかな
りのサンプルが共通して用いられることになり、各単位
区間ごとに求めたスペクトルに有効な差が生じないこと
が予想される。たとえば、上述の例の場合、第１番目の
単位区間ｄ１と第２番目の単位区間ｄ２とを比較する
と、２１〜１０２４番目のサンプルは両単位区間で全く
共通して利用されており、両者の相違は、わずか２０サ
ンプル分に依存していることになる。ただ、幸いにし
て、§３に述べるフーリエ変換の処理では、２０サンプ
ルに相当する位相の差が生じるため、両単位区間におけ
る複素強度Ａに大幅な差が生じる。しかし実効強度Ｅに
は、あまり差がみられないと予想される。このように、
隣接する単位区間のスペクトルに十分な差が得られない
と、変化の激しい音響信号に追従できず、結果的に時間
分解能が低下するという問題が生じることになる。この
ような問題に対処するためには、わずか２０サンプル分
の相違により、フーリエ変換の入力側に大きな変化が生
じるような対策を講じればよい。As described above, when sections that partially overlap on the time axis are set, a considerable number of samples are commonly used in adjacent unit sections, and the spectrum obtained for each unit section is used. No significant difference is expected. For example, in the case of the above example, comparing the first unit section d1 and the second unit section d2, the 21st to 1024th samples are completely used in both unit sections, and both units are used. The difference will depend on only 20 samples. However, fortunately, in the Fourier transform processing described in §3, a phase difference corresponding to 20 samples occurs, so that a great difference occurs in the complex intensity A in both unit sections. However, it is expected that there is not much difference in the effective strength E. in this way,
If a sufficient difference is not obtained between the spectra of adjacent unit sections, it is not possible to follow a rapidly changing acoustic signal, and as a result, there is a problem that the time resolution is reduced. In order to cope with such a problem, a countermeasure may be taken such that a difference of only 20 samples causes a large change on the input side of the Fourier transform.

【００３７】そこで、本願発明者は、§１で言及した重
み関数に対して、変化する２０サンプル分を強調するよ
うな細工を施すことを考案した。前述した周知のハニン
グ窓関数は、むしろ隣接区間の変動を抑える方向に働く
ため、上述の問題に対処する観点からは逆効果である。
そこで、区間両端の重みが減少するというハニング窓関
数の特徴を継承しつつ、２０サンプル分を強調するよう
な関数を考案し、実際に適用してみた。具体的には、単
位区間の区間長をＬ、オフセット長をΔＬとして、 α＝Ｌ／２−ΔＬ／２ β＝Ｌ／２＋ΔＬ／２なるα，βを定め、区間［α，β］で表される中央近傍
区間（単位区間の中央位置に定義された幅ΔＬの区間）
を定義し、ｋ＝１…αのときＷ（ｋ）＝０．５−０．５＊ｃｏｓ（πｋ／２α）ｋ＝α…βのときＷ（ｋ）＝０．５−０．５＊ｃｏｓ（π（ｋ−α）／Δ
Ｌ＋π／２）ｋ＝β…ＬのときＷ（ｋ）＝０．５−０．５＊ｃｏｓ（π（ｋ−β）／２
α＋３π／２）なる改良型窓関数Ｗ（ｋ）を重み関数として用いるよう
にすればよい。この改良型窓関数Ｗ（ｋ）は、半値幅が
ちょうどΔＬになるように狭幅に変形した分布関数であ
り、この関数を用いて実験を行ったところ、十分な効果
が確認できた。Therefore, the inventor of the present application has devised that the weighting function described in §1 is modified so as to emphasize 20 changing samples. The well-known Hanning window function described above works in the direction of suppressing the fluctuation of the adjacent section, and therefore has an opposite effect from the viewpoint of coping with the above problem.
Therefore, while inheriting the feature of the Hanning window function that the weights at both ends of the section are reduced, a function that emphasizes 20 samples was devised and actually applied. Specifically, assuming that the section length of the unit section is L and the offset length is ΔL, α and β such that α = L / 2−ΔL / 2 β = L / 2 + ΔL / 2 are determined, and are expressed in the section [α, β]. Section near the center (section of width ΔL defined at the center position of the unit section)
When k = 1... Α, W (k) = 0.5−0.5 * cos (πk / 2α) When k = α.β, W (k) = 0.5−0.5 * cos (π (k−α) / Δ
L + π / 2) When k = β... L W (k) = 0.5−0.5 * cos (π (k−β) / 2
α + 3π / 2) may be used as the weighting function. The improved window function W (k) is a distribution function that is deformed to have a narrow width so that the half-value width becomes exactly ΔL. Experiments using this function have confirmed a sufficient effect.

【００３８】§３．スペクトル強度の効率的な演算方
法さて、図１で説明した原理によれば、本発明に係る符号
化方法の基本手順は、まず、図１(a) に示すように、音
響データの時間軸上に複数の単位区間ｄ１，ｄ２，ｄ
３，…を設定し、区間ｄ１内の音響データに対してフー
リエ変換を行い、図１(b) に示すようなスペクトルを求
め、図１(c) に示すように、このスペクトルのピーク周
波数に相当するいくつかの符号ｎ（ｄ１，１），ｎ（ｄ
１，２），ｎ（ｄ１，３）によって、区間ｄ１の音響信
号を表現する、ということになる。ここでは、図１(b)
に示すようなスペクトルを求めるための効率的な演算方
法を述べることにする。 §3. Efficient calculation of spectral intensity
Act Now, according to the principle described with reference to FIG 1, the basic steps of the encoding method according to the present invention, first, FIG. 1 (a), the plurality of unit sections on a time axis of the acoustic data d1, d2, d
Are set, and the Fourier transform is performed on the acoustic data in the section d1 to obtain a spectrum as shown in FIG. 1 (b). As shown in FIG. 1 (c), the peak frequency of this spectrum is Some corresponding codes n (d1,1), n (d
The sound signal of the section d1 is expressed by (1, 2), n (d1, 3). Here, FIG. 1 (b)
An efficient calculation method for obtaining a spectrum as shown in FIG.

【００３９】図１(a) に示すような振動成分をもった信
号について、図１(b) に示すようなスペクトルを得る場
合、フーリエ変換を利用するのが一般的であり、実用上
は、高速フーリエ変換（ＦＦＴ）の手法を用いた演算が
行われる。しかしながら、一般的なフーリエ変換は、線
形な周波数軸を用いたスペクトルを得ることを前提とし
ており、ＭＩＤＩデータなどの非線形な符号データへの
変換には必ずしも適していない。これは次のような理由
によるものである。When obtaining a spectrum as shown in FIG. 1 (b) for a signal having a vibration component as shown in FIG. 1 (a), a Fourier transform is generally used. An operation is performed using a fast Fourier transform (FFT) technique. However, general Fourier transform is based on the premise that a spectrum using a linear frequency axis is obtained, and is not necessarily suitable for conversion to nonlinear code data such as MIDI data. This is due to the following reasons.

【００４０】いま、図５に示すような線形尺度によるフ
ーリエスペクトルを考えてみよう。このフーリエスペク
トルは、横軸に線形尺度による周波数ｆをとり、縦軸に
スペクトル強度をとったグラフである。ここで、横軸
（周波数軸）上には、複数Ｍ個の測定ポイントが等間隔
に離散的に定義されており、各測定ポイントごとに、そ
のスペクトル強度が棒グラフで示されている。グラフの
下欄には、各測定ポイントの番号が記されており、グ
ラフの下欄には、これら各測定ポイントに相当する周
波数値が記されている。この例は、サンプリング周波数
Ｆ＝２２．０５ｋＨｚで音響信号をデータとして取り込
んだ例であり、測定ポイントの数Ｍ＝１０２４に設定し
てある。したがって、周波数ｆ＝０となる第０番目の測
定ポイントから、周波数ｆ＝１１０１４Ｈｚ（サンプリ
ング周波数Ｆのほぼ１／２）となる第１０２３番目の測
定ポイントに至るまで、合計１０２４個の測定ポイント
のそれぞれにおいて、棒グラフの長さに相当するスペク
トル強度が求まっている。一般のフーリエ変換では、こ
のように線形な周波数軸上に等間隔で定義された多数の
測定ポイントについて、それぞれスペクトル強度が求め
られることになる。Now, consider a Fourier spectrum on a linear scale as shown in FIG. This Fourier spectrum is a graph in which the horizontal axis represents frequency f on a linear scale and the vertical axis represents spectrum intensity. Here, a plurality of M measurement points are discretely defined at regular intervals on the horizontal axis (frequency axis), and the spectrum intensity is shown in a bar graph for each measurement point. The lower column of the graph shows the numbers of the respective measurement points, and the lower column of the graph shows the frequency values corresponding to the respective measurement points. In this example, an acoustic signal is taken in as data at a sampling frequency F = 22.05 kHz, and the number M of measurement points is set to 1024. Therefore, each of a total of 1024 measurement points from the 0th measurement point at which the frequency f = 0 to the 1023th measurement point at which the frequency f = 111014 Hz (almost の of the sampling frequency F) In, the spectrum intensity corresponding to the length of the bar graph is obtained. In a general Fourier transform, spectrum intensities are obtained for a large number of measurement points defined at equal intervals on the linear frequency axis.

【００４１】ところが、この図５のように、線形な周波
数軸上に等間隔で定義された測定ポイントについて強度
が得られているスペクトルを、ＭＩＤＩデータのよう
に、周波数に関して非線形な特性を有する符号系への変
換に利用することは効率的ではない。図６は、図５に示
すスペクトルの周波数軸を対数尺度に書き直したもので
ある。グラフの下欄には、各測定ポイントの番号が記
されており、グラフの下欄には、これら各測定ポイン
トに対応づけられたノートナンバー（log ｆに相当）が
記されている。測定ポイントの数Ｍ＝１０２４である点
は図５と同じであるが、周波数軸が対数尺度となってい
るため、各測定ポイントは横軸上で等間隔には配置され
ていない。別言すれば、低周波領域では、測定ポイント
の配置は粗いが、高周波領域にゆくにしたがって、測定
ポイントの配置は密になる。However, as shown in FIG. 5, a spectrum whose intensity is obtained at measurement points defined at equal intervals on a linear frequency axis is converted to a code having nonlinear characteristics with respect to frequency, such as MIDI data. It is not efficient to use it for conversion to a system. FIG. 6 rewrites the frequency axis of the spectrum shown in FIG. 5 on a logarithmic scale. The lower column of the graph indicates the number of each measurement point, and the lower column of the graph indicates the note number (corresponding to log f) associated with each of the measurement points. The point that the number of measurement points is M = 1024 is the same as that of FIG. 5, but since the frequency axis is a logarithmic scale, the measurement points are not arranged at equal intervals on the horizontal axis. In other words, the arrangement of the measurement points is coarse in the low frequency region, but the arrangement of the measurement points becomes dense as it goes to the high frequency region.

【００４２】図６の例における低周波領域では、第１の
測定ポイントについては、ノートナンバーｎ＝４、第２
の測定ポイントについては、ノートナンバーｎ＝１６、
第３の測定ポイントについては、ノートナンバーｎ＝２
４を割り当てているが、これらの中間に位置するノート
ナンバーについては対応する測定ポイントが存在しない
ため、スペクトル強度が得られない結果となっており、
いわば歯抜けの櫛のような状態となっている。したがっ
て、サンプリング周波数Ｆ＝２２．０５ｋＨｚ、測定ポ
イントの数Ｍ＝１０２４という設定では、ノートナンバ
ーｎ＝５〜１５，１７〜２３についての強度を定義する
ことができなくなる。もちろん、測定ポイントの数Ｍ＝
１０２４を更に増やすようにすれば、歯抜けの状態を解
消することは可能であるが、そのような多数の測定ポイ
ントについての演算を行うこと自体が非効率的である。In the low frequency range in the example of FIG. 6, note number n = 4 and second
For the measurement point of, note number n = 16,
For the third measurement point, note number n = 2
4 is assigned. However, since there is no corresponding measurement point for the note number located in the middle of these, the spectrum intensity cannot be obtained.
It is like a comb with missing teeth. Therefore, when the sampling frequency F = 22.05 kHz and the number of measurement points M = 1024, it is not possible to define the intensities for note numbers n = 5 to 15, 17 to 23. Of course, the number of measurement points M =
If the number of 1024s is further increased, it is possible to eliminate the omission state, but it is inefficient to perform the calculation for such a large number of measurement points.

【００４３】逆に、高周波領域では、第９７０番目の測
定ポイント〜第１０２３番目の測定ポイントに至るまで
の合計５４個の測定ポイントが、同一のノートナンバー
ｎ＝１２４に割り当てられている。もちろん、この場
合、全５４個の測定ポイントについてのスペクトル強度
の平均値をノートナンバーｎ＝１２４についての強度と
定義すれば問題はないが、１つのノートナンバーｎ＝１
２４についての強度値を求めるのに、５４個もの測定ポ
イントについての演算を行うこと自体が非効率的であ
る。Conversely, in the high frequency region, a total of 54 measurement points from the 970th measurement point to the 1023th measurement point are assigned to the same note number n = 124. Of course, in this case, there is no problem if the average value of the spectral intensities for all 54 measurement points is defined as the intensity for the note number n = 124, but one note number n = 1
Performing calculations on as many as 54 measurement points to find the intensity value for 24 is inefficient.

【００４４】結局、ＭＩＤＩデータのような非線形な符
号コードへの変換を効率よく行うためには、必要な符号
コードに合わせて周波数軸上に複数Ｍ個の測定ポイント
を離散的に定義し、音響信号に含まれるＭ個の測定ポイ
ントに相当する周波数成分についてのスペクトル強度だ
けを求めるようにすればよい。特に、ＭＩＤＩデータへ
の変換を行う場合は、対数尺度の周波数軸上で等間隔と
なるように複数Ｍ個の測定ポイントを離散的に定義すれ
ばよい。別言すれば、各測定ポイントの周波数が等比数
列をなすように、複数Ｍ個の測定ポイントを離散的に定
義すればよい。図７は、このようにして定義した測定ポ
イントの一部分を示す図である。図示されている各測定
ポイントには、ノートナンバーｎ＝６０〜６５が割り当
てられており、これら各測定ポイントは、対数尺度の周
波数軸上で等間隔となっている。また、各測定ポイント
の具体的な周波数値２６２，２７８，２９４，…に着目
すると、等比数列をなしている。フーリエ変換によりス
ペクトル強度を演算する際には、これら各測定ポイント
についてのスペクトル強度のみを演算するようにすれ
ば、無駄な演算を省くことができる。After all, in order to efficiently convert to a non-linear code code such as MIDI data, a plurality of M measurement points are discretely defined on the frequency axis in accordance with a required code code, and the What is necessary is just to obtain | require only the spectrum intensity about the frequency component corresponding to M measurement points contained in a signal. In particular, when conversion to MIDI data is performed, a plurality of M measurement points may be discretely defined so as to be equally spaced on a logarithmic scale frequency axis. In other words, a plurality M of measurement points may be discretely defined so that the frequency of each measurement point forms a geometric progression. FIG. 7 is a diagram showing a part of the measurement points defined in this way. The measurement points shown are assigned note numbers n = 60 to 65, and these measurement points are equally spaced on the logarithmic scale frequency axis. Focusing on the specific frequency values 262, 278, 294,... Of each measurement point, they form a geometric progression. When calculating the spectrum intensity by the Fourier transform, if only the spectrum intensity at each of these measurement points is calculated, useless calculation can be omitted.

【００４５】以下、このような無駄を省いた効率的な演
算を行うための具体的な方法を説明する。まず、説明の
便宜上、本発明の符号化方法に、一般的なフーリエ変換
を適用する手順を説明する。ここでは、図８に示すよう
な音響信号に対してフーリエ変換を行い、符号化を行う
場合を考える。前述したように、本発明では、音響信号
の時間軸上に単位区間を設定し、この単位区間をＰ個の
代表符号コードによって表現することになる。図８に示
す単位区間ｄｉは、区間長Ｌを有する第ｉ番目の単位区
間を示しており、ここでは、この単位区間ｄｉ内にＫ個
のサンプルが含まれているものとする。すなわち、サン
プリング周波数をＦとして、区間長Ｌを時間の単位で表
せば、Ｋ／Ｆ＝Ｌとなる。また、音響信号の左端の位置
に基準時刻ｔ＝ｔ０を設定し、単位区間ｄｉの左端の時
刻を区間開始時刻ｔ＝ｔｓ、右端の時刻を区間終了時刻
ｔ＝ｔｅとする。更に、基準時刻ｔ０から区間開始時刻
ｔｓまでの時間をΔｔｈとし、このΔｔｈの時間内に含
まれるサンプル数をｈとする。A specific method for performing such an efficient operation without waste will be described below. First, for convenience of explanation, a procedure for applying a general Fourier transform to the encoding method of the present invention will be described. Here, a case is considered where Fourier transform is performed on an acoustic signal as shown in FIG. 8 and encoding is performed. As described above, in the present invention, a unit section is set on the time axis of an audio signal, and this unit section is represented by P representative code codes. A unit section di shown in FIG. 8 indicates an i-th unit section having a section length L. Here, it is assumed that K samples are included in this unit section di. That is, if the sampling frequency is F and the section length L is expressed in units of time, then K / F = L. The reference time t = t0 is set at the left end position of the sound signal, the left end time of the unit section di is set as the section start time t = ts, and the right end time is set as the section end time t = te. Further, the time from the reference time t0 to the section start time ts is Δth, and the number of samples included in the time of Δth is h.

【００４６】一方、このフーリエ変換によって、図９に
示すようなフーリエスペクトルを求める場合を考える。
このフーリエスペクトルでは、周波数軸上にＭ個の測定
ポイントが定義されており、第ｍ番目（ｍ＝０，１，
２，…，Ｍ−１）の測定ポイントは、周波数ｆ（ｍ）に
相当し、そのスペクトル強度はＳ（ｍ）となっている。
既に述べたように、従来の一般的なフーリエ変換では、
Ｍ個の測定ポイントは、線形尺度の周波数軸上に等間隔
で定義される。フーリエ変換の基本原理は、種々の周波
数をもった正弦関数および余弦関数からなる参照信号を
用意し、フーリエ変換の対象となる音響信号と種々の参
照信号との相関を求め、その相関の程度をスペクトル強
度として示すことにある。たとえば、図９において、周
波数ｆ（ｍ）に相当する第ｍ番目の測定ポイントにおけ
るスペクトル強度Ｓ（ｍ）の値は、同じ周波数ｆ（ｍ）
をもった参照信号との相関の程度を示す値ということに
なる。結局、単位区間ｄｉ内の音響信号について、図９
に示すようなフーリエスペクトルを求めるためには、こ
の単位区間ｄｉ内の音響信号を、周波数ｆ（０）〜ｆ
（Ｍ−１）をもった個々の参照信号と比較し、それぞれ
の相関の程度を、スペクトル強度Ｓ（０）〜Ｓ（Ｍ−
１）として求めればよい。On the other hand, consider the case where a Fourier spectrum as shown in FIG. 9 is obtained by this Fourier transform.
In this Fourier spectrum, M measurement points are defined on the frequency axis, and the m-th measurement point (m = 0, 1, 1)
The measurement point of (2,..., M-1) corresponds to the frequency f (m), and the spectrum intensity is S (m).
As already mentioned, in the conventional general Fourier transform,
The M measurement points are defined at regular intervals on the frequency axis of a linear scale. The basic principle of Fourier transform is to prepare reference signals consisting of a sine function and a cosine function having various frequencies, obtain a correlation between an acoustic signal to be subjected to Fourier transform and various reference signals, and determine a degree of the correlation. It is to be shown as spectral intensity. For example, in FIG. 9, the value of the spectrum intensity S (m) at the m-th measurement point corresponding to the frequency f (m) is the same as the frequency f (m)
Is a value indicating the degree of correlation with the reference signal having. As a result, FIG. 9 shows the sound signal in the unit section di.
In order to obtain a Fourier spectrum as shown in FIG.
(M-1) is compared with each reference signal, and the degree of each correlation is determined by the spectrum intensities S (0) to S (M-
It can be obtained as 1).

【００４７】このような相関を求める演算の基本手法
を、図１０を参照しながら説明する。図１０の上段に示
す信号波形は、フーリエ変換の対象となる音響信号の波
形であり、図１０の下段に示す信号波形は、第ｍ番目の
周波数ｆ（ｍ）をもった参照信号（この例では余弦関
数）の波形である。いずれの信号波形も、基準時刻ｔ＝
ｔ０を時間軸の基準としており、その振幅値は、−１〜
＋１の範囲内の値をとるように規格化されている。さ
て、上段のグラフの時間軸上に設定した単位区間ｄｉ内
に含まれる音響信号波形と、下段のグラフに示された周
波数ｆ（ｍ）をもった参照信号との相関を示す値、すな
わち、周波数ｆ（ｍ）におけるスペクトル強度Ｓ（ｍ）
は、図１１に示すような式によって求めることができ
る。この式を用いた変換はコサイン変換（フーリエ変換
における虚数成分を考慮しない変換）と呼ばれている。
実はフーリエ変換を示す式は図１２のようになるが、こ
こでは便宜上、まず図１１のコサイン変換を示す式につ
いて説明する。A basic method of calculating such a correlation will be described with reference to FIG. The signal waveform shown in the upper part of FIG. 10 is a waveform of an acoustic signal to be subjected to Fourier transform, and the signal waveform shown in the lower part of FIG. 10 is a reference signal having the m-th frequency f (m) (this example). Is a cosine function). Each signal waveform has a reference time t =
t0 is used as a reference on the time axis, and its amplitude value is −1 to 1
It is standardized to take a value within the range of +1. Now, a value indicating the correlation between the acoustic signal waveform included in the unit section di set on the time axis of the upper graph and the reference signal having the frequency f (m) shown in the lower graph, that is, Spectral intensity S (m) at frequency f (m)
Can be obtained by an equation as shown in FIG. Transformation using this equation is called cosine transform (transformation that does not consider imaginary components in Fourier transform).
Actually, the equation representing the Fourier transform is as shown in FIG. 12, but here, for convenience, the equation representing the cosine transform in FIG. 11 will be described first.

【００４８】この図１１の式で、右辺のＡ（ｈ＋ｋ）な
る項は、音響信号の第ｉ番目の単位区間ｄｉ内の第ｋ番
目（ｋ＝０，１，２，…，Ｋ−１）のサンプルの振幅値
を示している。図１０の上段のグラフでは、基準時刻ｔ
０から区間開始時刻ｔｓに至るまでの時間Δｔｈ内に含
まれるサンプル数がｈであり、区間開始時刻ｔｓから数
えて第ｋ番目のサンプルは、基準時刻ｔ０から数えると
第（ｈ＋ｋ）番目のサンプルということになる。よっ
て、基準時刻ｔ０から数えて第（ｈ＋ｋ）番目のサンプ
ルの振幅値はＡ（ｈ＋ｋ）であり、区間開始時刻ｔｓか
ら当該サンプルに至るまでの時間をΔｔｋとすれば、基
準時刻ｔ０から当該サンプルに至るまでの時間は（Δｔ
ｈ＋Δｔｋ）ということになる。In the equation of FIG. 11, the term A (h + k) on the right side is the k-th (k = 0, 1, 2,..., K−1) in the i-th unit section di of the audio signal. 2 shows the amplitude value of the sample. In the upper graph of FIG. 10, the reference time t
The number of samples included in the time Δth from 0 to the section start time ts is h, and the kth sample counted from the section start time ts is the (h + k) th sample counted from the reference time t0. It turns out that. Therefore, the amplitude value of the (h + k) -th sample counted from the reference time t0 is A (h + k), and if the time from the section start time ts to the sample is Δtk, the sample value from the reference time t0 to the sample Time until (Δt
h + Δtk).

【００４９】また、この図１１の式の右辺のｃｏｓ（２
π・ｆ（ｍ）・（Δｔｈ＋Δｔｋ））なる項は、周波数
ｆ（ｍ）の参照信号（余弦関数）の上記サンプルに相当
する位置の振幅値を示している。すなわち、図１０の下
段のグラフにおいて、基準時刻ｔ０から時間（Δｔｈ＋
Δｔｋ）だけ隔たった位置（上段のグラフの第（ｈ＋
ｋ）番目のサンプルと同じ位置）における参照信号の振
幅値ということになる。右辺において項Ａ（ｈ＋ｋ）
と、項ｃｏｓ（２π・ｆ（ｍ）・（Δｔｈ＋Δｔｋ））
との積が求められているのは、この時間軸上の特定の位
置における両者の相関を求めるためである。単位区間ｄ
ｉ内には、全部でＫ個のサンプルが含まれているので、
この全Ｋ個のサンプルについて同様に相関を示す値を求
め、これらの総和を計算する。図１１に示す式における
Σ記号は、ｋ＝０，１，２，…，（Ｋ−１）についての
総和を示しており、右辺頭の（１／Ｋ）は、サンプル数
Ｋによる割り算を行うことにより相関の平均を求めるた
めのものである。前述のように、音響信号の振幅値も、
参照信号の振幅値も、いずれも−１〜＋１の範囲内の値
をとるように規格化されているので、相関の程度が大き
いほど、スペクトル強度Ｓ（ｍ）の値は大きくなる。し
たがって、得られたスペクトル強度Ｓ（ｍ）の値は、単
位区間ｄｉ内の音響信号波形に含まれる周波数ｆ（ｍ）
の成分の強度を示すものになる。Further, cos (2) on the right side of the equation of FIG.
The term π · f (m) · (Δth + Δtk)) indicates an amplitude value of a position corresponding to the sample of the reference signal (cosine function) of the frequency f (m). That is, in the lower graph of FIG. 10, the time (Δth +
Δtk) (the (h + in the upper graph)
(the same position as the k) -th sample). Term A (h + k) on the right side
And the term cos (2π · f (m) · (Δth + Δtk))
The reason for obtaining the product is that the correlation between the two at a specific position on the time axis is obtained. Unit section d
Since i contains a total of K samples,
Similarly, values indicating the correlation are obtained for all K samples, and the sum of them is calculated. The symbol に in the equation shown in FIG. 11 indicates the sum of k = 0, 1, 2,..., (K−1), and (1 / K) on the right side performs division by the number of samples K. This is for obtaining the average of the correlation. As described above, the amplitude value of the acoustic signal is also
Since the amplitude value of the reference signal is also normalized so as to take a value in the range of −1 to +1, the value of the spectrum intensity S (m) increases as the degree of correlation increases. Therefore, the obtained value of the spectrum intensity S (m) is equal to the frequency f (m) included in the sound signal waveform in the unit section di.
This indicates the intensity of the component.

【００５０】一方、フーリエ変換では、図１１に示す式
の代わりに、図１２に示す式が用いられる。この図１２
に示す式の右辺のＷ（ｋ）なる項は、区間長Ｌにわたっ
て作用させる重み関数であり、単位区間ｄｉ内の第ｋ番
目のサンプル（基準時刻ｔ０から数えて第（ｈ＋ｋ）番
目のサンプル）の振幅値Ａ（ｈ＋ｋ）に対する重みづけ
を示すものである。この重み関数Ｗ（ｋ）については、
§２で述べたとおりである。一方、右辺のｅｘｐ（−ｊ
２πｆ（ｍ）・（ｈ＋ｋ）／Ｆ）なる項は、図１２にも
示されているとおり、ｃｏｓ（２π・ｆ（ｍ）・（ｈ＋
ｋ）／Ｆ）−ｊｓｉｎ（２π・ｆ（ｍ）・（ｈ＋ｋ）／
Ｆ）なる形に展開され、余弦関数の振幅値を実数軸に、
正弦関数の振幅値を虚数軸にとった三角関数の複素強度
を示すものである。ここで、Ｆはサンプリング周波数で
あるから、（ｈ＋ｋ）／Ｆ＝Δｔｈ＋Δｔｋとなり、余
弦関数の項は図１１に示す式における余弦関数の項と同
じになる。結局、図１１に示すコサイン変換の式では、
余弦関数との相関のみを考慮していたのに対し、図１２
に示すフーリエ変換の式では、余弦関数との相関と正弦
関数との相関との双方を考慮することができ、音響信号
と参照信号との間の位相のずれによる影響を解消するこ
とができる。また、図１２に示す式では、上述のように
重み関数Ｗ（ｋ）を乗じることにより、隣接する単位区
間との間の差をより強調することができる。On the other hand, in the Fourier transform, an equation shown in FIG. 12 is used instead of the equation shown in FIG. This FIG.
The term W (k) on the right-hand side of the equation shown in is a weighting function applied over the section length L, and is the k-th sample ((h + k) -th sample counting from the reference time t0) in the unit section di. Of the amplitude value A (h + k). For this weight function W (k),
As described in §2. On the other hand, exp (−j
As shown in FIG. 12, the term 2πf (m) · (h + k) / F) is cos (2π · f (m) · (h +
k) / F) -jsin (2π · f (m) · (h + k) /
F), the amplitude of the cosine function is plotted on the real axis,
It shows the complex intensity of a trigonometric function with the sine function amplitude value on the imaginary axis. Here, since F is the sampling frequency, (h + k) / F = Δth + Δtk, and the term of the cosine function is the same as the term of the cosine function in the equation shown in FIG. Eventually, the cosine transform equation shown in FIG.
While only the correlation with the cosine function was considered, FIG.
In the Fourier transform equation shown in (1), both the correlation with the cosine function and the correlation with the sine function can be considered, and the influence of the phase shift between the acoustic signal and the reference signal can be eliminated. Further, in the equation shown in FIG. 12, by multiplying the weight function W (k) as described above, the difference between adjacent unit sections can be further emphasized.

【００５１】こうして図１２に示す式を用いることによ
り、第ｍ番目の周波数ｆ（ｍ）についてのスペクトル強
度Ｓ（ｍ）を求めることができるので、ｍ＝０，１，
２，…，（Ｍ−１）のすべてについて同様の演算を行え
ば、図９に示すようなフーリエスペクトルが得られるこ
とになる。ただ、従来の一般的なフーリエ変換では、上
述したように、Ｍ個の測定ポイントは線形尺度の周波数
軸上に等間隔に定義されており、たとえば、図１３に示
すように、ｆ（ｍ）＝Ｆ・ｍ／Ｍ（ただし、ｍ＝０，
１，２，…，Ｍ−１）のように設定される。具体的に
は、サンプリング周波数Ｆ＝２２．０５ｋＨｚ、Ｍ＝１
０２４の場合であれば、図１３の表に示されているよう
な周波数ｆ（ｍ）をもったＭ個の測定ポイントが定義さ
れることになる（実際には、サンプリング定理により、
サンプリング周波数Ｆの１／２を越える周波数部分につ
いては、正しいスペクトル強度は求められない）。この
ように、線形尺度の周波数軸上に等間隔に測定ポイント
を定義して求めたフーリエスペクトルを、ＭＩＤＩデー
タのような非線形特性を有する符号系への変換に利用す
ると、図６に示すように、低周波領域ではノートナンバ
ーの歯抜けが生じ、高周波領域ではあまりに冗長な周波
数精度で演算結果が得られることになり、極めて非効率
的であることは既に述べたとおりである。By using the equation shown in FIG. 12, the spectrum intensity S (m) for the m-th frequency f (m) can be obtained, so that m = 0, 1,
If the same operation is performed for all of 2,..., (M-1), a Fourier spectrum as shown in FIG. 9 is obtained. However, in the conventional general Fourier transform, as described above, M measurement points are defined at regular intervals on the frequency axis of a linear scale. For example, as shown in FIG. = F · m / M (where m = 0,
1, 2,..., M-1). Specifically, the sampling frequency F = 22.05 kHz, M = 1
In the case of 024, M measurement points having frequencies f (m) as shown in the table of FIG. 13 are defined (actually, according to the sampling theorem,
A correct spectral intensity cannot be obtained for a frequency portion exceeding 1/2 of the sampling frequency F). As described above, when the Fourier spectrum obtained by defining measurement points at regular intervals on the frequency axis of the linear scale is used for conversion to a code system having nonlinear characteristics such as MIDI data, as shown in FIG. As described above, in the low frequency region, note numbers are missing, and in the high frequency region, calculation results are obtained with excessively redundant frequency accuracy, which is extremely inefficient.

【００５２】そこで本実施形態では、たとえば、図１４
に示すように、ｆ（ｍ）＝４４０・１０^γ（ｎ）なる式（ｎ＝０，１，２，…，１２７）によって、対数
尺度の周波数軸上に等間隔となるように、合計１２８個
の測定ポイントを定義している。ここで、ｎはＭＩＤＩ
データのノートナンバーであり、 γ（ｎ）＝（ｎ−６９）・log ２／１２なる式が成り立つ。ここで、「１２」は１オクターブ
（周波数が２倍になる幅）に含まれる半音の数に相当す
る。図１４の表は、ノートナンバーｎと、γ（ｎ）と、
ｆ（ｍ）との関係を示している。図示のとおり、ノート
ナンバー６９（ピアノの鍵盤中央の「ラ音（Ａ３音」に
相当）の場合、γ（ｎ）＝０となり、周波数ｆ（ｍ）＝
４４０Ｈｚとなっている。周波数ｆ（ｍ）の値は等比数
列をなし、対数尺度の周波数軸上で等間隔となってい
る。Therefore, in this embodiment, for example, FIG.
As shown in the equation, f (m) = 440 · 10 ^{γ (n)} (n = 0, 1, 2,..., 127) such that a total of 128 Number of measurement points are defined. Where n is MIDI
It is the note number of the data, and the following equation holds: γ (n) = (n−69) · log 2/12 Here, “12” corresponds to the number of semitones included in one octave (width at which the frequency doubles). The table of FIG. 14 shows the note number n, γ (n),
The relationship with f (m) is shown. As shown in the figure, in the case of the note number 69 (corresponding to the “La sound (A3 sound)” at the center of the piano keyboard, γ (n) = 0, and the frequency f (m) =
It is 440 Hz. The values of the frequency f (m) form a geometric progression and are equally spaced on a logarithmic scale frequency axis.

【００５３】結局、図１３の表に示されているような従
来の一般的なフーリエ変換における測定ポイントを用い
る代わりに、本発明では、図１４の表に示されているよ
うな測定ポイントを用いてスペクトル演算を行うように
したため、符号化に必要な周波数値についてのみ必要な
演算が行われるようになる。本発明の最終目的は、フー
リエスペクトルを得ることではなく、音響信号を符号化
することであり、しかも符号化に必要な周波数（用いる
符号に対応する周波数）は予め定まっている。そこで、
この予め定まっている周波数の成分（図１４の表のｆ
（ｍ）欄に示された周波数成分）を求める演算だけを行
うことにより演算効率を高めよう、という思想が、本発
明の基本的技術思想である。After all, instead of using measurement points in the conventional general Fourier transform as shown in the table of FIG. 13, the present invention uses measurement points as shown in the table of FIG. Since the spectrum calculation is performed by using the frequency calculation, the necessary calculation is performed only for the frequency value necessary for the encoding. The final object of the present invention is not to obtain a Fourier spectrum but to encode an audio signal, and the frequency required for encoding (the frequency corresponding to the code to be used) is predetermined. Therefore,
This predetermined frequency component (f in the table of FIG. 14)
The basic technical idea of the present invention is to increase the calculation efficiency by performing only the calculation for obtaining the (frequency component shown in column (m)).

【００５４】もっとも、一般的なフーリエ変換を行う場
合、高速フーリエ変換（ＦＦＴ）の演算手法を利用し
て、演算時間の短縮化を図る方法が採られる。このＦＦ
Ｔの演算手法では、Ｍ個の測定ポイントが線形周波数軸
上に等間隔に定義され、かつ、単位区間内のサンプル数
をＫとした場合に、Ｍ＝Ｋに設定することが前提とな
る。このため、本発明に係る方法には、ＦＦＴの演算手
法を利用することはできない。しかしながら、サンプリ
ング周波数Ｆ＝２２．０５ｋＨｚ、単位区間内のサンプ
ル数Ｋ＝１０２４に設定して、図１２の式に基いて本発
明に係るフーリエ変換を実行したところ、従来のＦＦＴ
の演算手法を利用したフーリエ変換（低周波領域のノー
トナンバーに歯抜けが生じる）に要する時間の約２倍ほ
どの演算時間で演算が完了した。したがって、本発明に
係る方法は、実用上、十分に利用価値がある。However, when a general Fourier transform is performed, a method of shortening the calculation time by using a fast Fourier transform (FFT) calculation method is adopted. This FF
In the calculation method of T, it is assumed that M = K when M measurement points are defined at equal intervals on the linear frequency axis and the number of samples in a unit section is K. For this reason, the method according to the present invention cannot use the FFT calculation method. However, when the sampling frequency F is set to 22.05 kHz and the number of samples K in the unit section is set to 1024, and the Fourier transform according to the present invention is executed based on the equation of FIG.
The calculation was completed in about twice as long as the time required for the Fourier transform using the calculation method (the note number in the low-frequency region is missing). Therefore, the method according to the present invention is sufficiently useful in practical use.

【００５５】また、図１４に示す例では、ＭＩＤＩデー
タのノートナンバーｎ＝０〜１２７の範囲をカバーする
ため、全１２８個の測定ポイントを設定しているが、再
生用のＭＩＤＩ音源によっては、必ずしもこれらすべて
のノートナンバーは必要ではないので、用いるＭＩＤＩ
音源に応じて必要なノートナンバーについてのスペクト
ル強度演算だけを行うようにすれば、演算時間を更に短
縮させることができる。たとえば、再生用のＭＩＤＩ音
源としてピアノの音源を用いる場合、一般的なピアノの
最も左側の鍵盤はノートナンバーｎ＝２１、最も右側の
鍵盤はノートナンバーｎ＝１０８であるから、ノートナ
ンバーｎ＝２１〜１０８の範囲内について、スペクトル
強度演算を行うだけで足りる。更に、たとえば、ハ長調
のみを用いて符号化するという限定事項を付加すれば、
ピアノの黒鍵に相当するノートナンバーは不要になるの
で、演算時間を更に短縮させることも可能である。In the example shown in FIG. 14, a total of 128 measurement points are set in order to cover the range of MIDI data note number n = 0 to 127. However, depending on the MIDI sound source for reproduction, Since not all of these note numbers are required, MIDI
If only the spectral intensity calculation for the required note number is performed according to the sound source, the calculation time can be further reduced. For example, when a piano sound source is used as a MIDI sound source for reproduction, the leftmost keyboard of a general piano has a note number n = 21 and the rightmost keyboard has a note number n = 108. It is only necessary to perform a spectrum intensity calculation in the range of ~ 108. Further, for example, by adding a restriction that encoding is performed using only C major,
Since the note number corresponding to the black key of the piano is not required, the calculation time can be further reduced.

【００５６】以上、本発明に係る音響信号の符号化方法
の基本的な手法を説明したが、本願発明者は、上述の手
法に細かな改良を施すことにより、更に良好な結果が得
られることを見出だした。すなわち、図１０に示すよう
な位相関係のもとで音響信号と参照信号との相関を求め
る代わりに、図１５に示すような位相関係のもとで相関
を求めるのである。両者の相違点は、前者では、参照信
号の時間軸上の基準点が基準時刻ｔ＝ｔ０に設定されて
いるのに対し、後者では、参照信号の時間軸上の基準点
が区間開示時刻ｔｓに設定されている点である。別言す
れば、前者では、音響信号と参照信号との位相関係は固
定されており、いずれの単位区間についての演算を行う
場合でも、この固定された位相関係での相関がとられる
ことになる。これに対し、後者では、音響信号と参照信
号との位相関係は、個々の単位区間についての演算を行
うごとに変動することになる。たとえば、図１５におい
て、単位区間ｄｉについての参照信号は図示のとおりの
位相を有しているが、これに後続する単位区間ｄ（ｉ＋
１）についての参照信号は、図示されている参照信号の
位相を若干右へずらしたものになる。Although the basic method of the audio signal encoding method according to the present invention has been described above, the inventor of the present application has found that even better results can be obtained by making fine improvements to the above method. Was found. That is, instead of finding the correlation between the acoustic signal and the reference signal under the phase relationship shown in FIG. 10, the correlation is found under the phase relationship shown in FIG. The difference between the two is that in the former, the reference point on the time axis of the reference signal is set to the reference time t = t0, whereas in the latter, the reference point on the time axis of the reference signal is the section disclosure time ts Is set to. In other words, in the former, the phase relationship between the acoustic signal and the reference signal is fixed, and when performing the operation for any unit section, the correlation with this fixed phase relationship is obtained. . On the other hand, in the latter, the phase relationship between the sound signal and the reference signal changes every time the calculation is performed for each unit section. For example, in FIG. 15, although the reference signal for the unit section di has the phase as shown, the subsequent unit section d (i +
The reference signal for 1) is obtained by slightly shifting the phase of the illustrated reference signal to the right.

【００５７】図１０に示すような位相関係のもとで相関
を求める場合には、前述したように図１２に示す式が用
いられる。これに対して、図１５に示すような位相関係
のもとで相関を求める場合には、図１６に示す式を用い
ればよい。両者の相違点は、前者における指数関数内の
（ｈ＋ｋ）なる項をｈに置き換えた点である。これは、
図１５の下段に示されているように、参照信号の時間軸
上の基準点が区間開始時刻ｔｓとなったため、三角関数
内の時間項がΔｔｋとなるためである。When the correlation is obtained under the phase relationship as shown in FIG. 10, the equation shown in FIG. 12 is used as described above. On the other hand, when obtaining the correlation under the phase relationship as shown in FIG. 15, the equation shown in FIG. 16 may be used. The difference between the two is that the term (h + k) in the exponential function in the former is replaced with h. this is,
This is because, as shown in the lower part of FIG. 15, the reference point on the time axis of the reference signal is the section start time ts, and the time term in the trigonometric function is Δtk.

【００５８】本願発明者は、同一のヴォーカル音響信号
に対して、図１０に示す位相関係のもとでスペクトル強
度を求める強度演算を行うことにより得られたＭＩＤＩ
符号データと、図１５に示す位相関係のもとで強度演算
を行うことにより得られたＭＩＤＩ符号データとを比較
してみた。その結果、概して、後者のＭＩＤＩ符号デー
タの方がもとの音響信号を正確に表現していることが判
明した。その理由についての詳細な解析は行っていない
が、おそらく参照信号の時間軸上の基準点を個々の単位
区間ごとに移動させてゆくと、各単位区間ごとに誤った
相関を示す確率が分散され、全体として正しい符号化が
行われるものと考えられる。もちろん、もとの音響信号
が、正確な正弦波のような場合は、むしろ図１０に示す
ような固定された位相関係のもとで強度演算を行う方が
正確な符号化が行われるものと予想されるが、ヴォーカ
ル音響信号のように、不規則な信号波形に対しては、図
１５に示すような変動する位相関係のもとで強度演算を
行う方が、相関の検出エラーが分散されることになり、
より適した符号化が行われるものと考えられる。The inventor of the present application has performed MIDI operation on the same vocal acoustic signal to obtain a spectrum intensity based on the phase relationship shown in FIG.
The coded data was compared with MIDI coded data obtained by performing an intensity calculation under the phase relationship shown in FIG. As a result, it was found that the latter MIDI code data generally represented the original sound signal more accurately. Although we did not perform a detailed analysis of the reason, perhaps moving the reference point on the time axis of the reference signal for each individual unit interval would disperse the probability of showing an incorrect correlation for each unit interval. It is considered that correct encoding is performed as a whole. Of course, when the original sound signal is an accurate sine wave, it is more accurate to perform the intensity calculation under a fixed phase relationship as shown in FIG. As expected, for an irregular signal waveform such as a vocal acoustic signal, performing the intensity calculation under a fluctuating phase relationship as shown in FIG. Will be
It is considered that more suitable encoding is performed.

【００５９】§４．符号コードの統合処理上述の§２で述べたように、部分的に重複する区間設定
を行った場合、作成される符号コードの数はかなり増え
ることになる。ここでは、最終的に作成される符号コー
ドの数をできるだけ削減するために効果的な統合処理を
説明する。 §4. Code Code Integration Process As described in §2 above, when a partially overlapping section is set, the number of code codes to be created increases considerably. Here, a description will be given of an effective integration process for reducing the number of finally generated code codes as much as possible.

【００６０】たとえば、図１７(a) に示すような音符で
示される符号コードが作成された場合を考える。図示の
例では、すべての符号コードが八分音符から構成されて
いる。これは、区間長Ｌが一定であるため、作成される
個々の符号コードも同一の長さになるためである。しか
しながら、この図１７(a) に示す音符群は、図１７(b)
に示すように書き直すことができる。すなわち、同じ音
階を示す音符が複数連続して配置されていた場合には、
この複数の音符を１つの音符に統合することができる。
別言すれば、複数の単位区間に跨がった音符によって、
個々の単位区間ごとの音符を置換することができる。For example, consider a case where a code code represented by musical notes as shown in FIG. In the illustrated example, all code codes are composed of eighth notes. This is because, since the section length L is constant, each generated code code has the same length. However, the note group shown in FIG.
Can be rewritten as shown in That is, when a plurality of notes indicating the same scale are arranged consecutively,
This plurality of notes can be integrated into one note.
In other words, notes spanning multiple unit sections
It is possible to replace notes for each unit section.

【００６１】この図１７に示す例では、同じ音階の音符
のみを統合したが、統合対象となる音符は、必ずしも同
じ音階の音符に限定されるものではなく、ある程度の類
似性をもった音符を統合対象としてかまわない。たとえ
ば、互いに１音階の差しかない一連の音符を統合対象と
して、１つの音符に置換することもできる。この場合
は、たとえば、一連の音符の中で音階の低い方の音符に
よって置換すればよい。一般的に拡張すれば、隣接する
複数の単位区間について、所定の条件下で互いに類似す
る代表符号コードがある場合、これら類似する代表符号
コードを、複数の単位区間に跨がった統合符号コードに
置換することにより、音符数を削減することが可能にな
る。In the example shown in FIG. 17, only the notes of the same scale are integrated, but the notes to be integrated are not necessarily limited to the notes of the same scale. It may be integrated. For example, a series of notes that are not one scale apart from each other can be replaced by one note as a unit to be integrated. In this case, for example, it may be replaced by a note with a lower scale in a series of notes. In general, if there are representative code codes similar to each other under predetermined conditions for a plurality of adjacent unit sections, these similar representative code codes are converted to an integrated code code extending over the plurality of unit sections. , The number of notes can be reduced.

【００６２】なお、図１７では、音符を統合する例につ
いて、符号コードの統合処理の概念を説明したが、本発
明に係る符号化処理によって作成される符号コードに
は、それぞれ強度を示すデータ（ＭＩＤＩデータの場合
はベロシティー）が付加されている。したがって、符号
コードを統合した場合、強度を示すデータも統合する必
要がある。ここで、統合対象となる符号コードに、それ
ぞれ異なる強度データが定義されていた場合には、たと
えば、最も大きな強度データを統合後の符号コードにつ
いての強度データと定めるようにすればよい。ただ、Ｍ
ＩＤＩデータの場合、２つの符号コードを統合する際
に、先行する符号コードの強度に比べて、後続する符号
コードの強度がかなり大きい場合、これら２つの符号コ
ードを統合すると不自然になる。これは、通常のＭＩＤ
Ｉ音源の再生音は、楽器の演奏音から構成されており、
音の強度が時間とともに減衰してゆくのが一般的だから
である。したがって、先行する符号コードの強度に比べ
て、後続する符号コードの強度が小さい場合には、１つ
の統合符号コードに置換しても不自然さは生じないが、
逆の場合には、不自然さが生じることになる。そこで、
２つの符号コードの強度差が所定の基準以上であり、か
つ、先行する符号コードの強度に比べて、後続する符号
コードの強度が大きい場合には、統合を行わない、とい
うような条件を設定しておくのが好ましい。In FIG. 17, the concept of code code integration processing has been described for an example of integrating notes, but code codes created by the encoding processing according to the present invention include data (for example, In the case of MIDI data, a velocity is added. Therefore, when code codes are integrated, it is necessary to also integrate data indicating strength. Here, when different intensity data are defined for the code codes to be integrated, for example, the largest intensity data may be determined as the intensity data for the integrated code code. Just M
In the case of IDI data, when combining two code codes, if the strength of the succeeding code code is considerably higher than the strength of the preceding code code, it becomes unnatural to integrate these two code codes. This is a normal MID
The playback sound of the I sound source is composed of the performance sounds of the musical instrument,
This is because the sound intensity generally decreases with time. Therefore, when the strength of the succeeding code code is smaller than the strength of the preceding code code, the unnaturalness does not occur even if the integrated code code is replaced with one integrated code code.
In the opposite case, unnaturalness will occur. Therefore,
If the difference between the strengths of the two code codes is greater than or equal to a predetermined reference and the strength of the succeeding code code is greater than the strength of the preceding code code, a condition is set such that integration is not performed. It is preferable to keep it.

【００６３】ところで、一般的なＭＩＤＩ規格では、符
号コードを複数のトラックに分けて収録することができ
る。したがって、本発明において作成された符号コード
も、実用上は複数のトラックに収録されることになる。
たとえば、図３(b) には、３つのトラックＴ１〜Ｔ３に
分けて代表符号コード（図示の例では音符）が収録され
た状態が示されている。この場合、同一トラック上に隣
接して配置された代表符号コードが所定の類似条件を満
足する場合に、この隣接配置された代表符号コードを単
一の代表符号コードに統合する処理を行うことになる。By the way, according to the general MIDI standard, a code code can be divided into a plurality of tracks and recorded. Therefore, the code code created in the present invention is recorded in a plurality of tracks in practical use.
For example, FIG. 3B shows a state where representative code codes (notes in the illustrated example) are recorded in three tracks T1 to T3. In this case, when representative code codes arranged adjacently on the same track satisfy a predetermined similarity condition, a process of integrating the representative code codes arranged adjacently into a single representative code code is performed. Become.

【００６４】上述のように、符号コードの統合処理が行
われると、符号コードの数を低減させるメリットが得ら
れるので、できる限り統合処理が促進されるような配慮
を行うのが望ましい。そこで、複数の代表符号コードを
複数のトラックに分配して配置する際に、同一トラック
上に隣接して配置される代表符号コードが類似条件を満
足する確率が高くなるように、分配の順序を調整するよ
うにするのが好ましい。具体的には、各符号コードを周
波数に基いてソートしてから各トラックに収容すればよ
い。たとえば、図３(b) に示すように、３つの符号デー
タを３個のトラックＴ１，Ｔ２，Ｔ３に分配する場合、
３つのうち最も周波数の低いものをトラックＴ１へ、次
に周波数の低いものをトラックＴ２へ、最も周波数の高
いものをトラックＴ３へ、それぞれ収容するように分配
方法を決めておけば、周波数に全く無関係に分配した場
合に比べて、統合対象となる音符が出現する確率は向上
すると考えられる。As described above, when code code integration processing is performed, the merit of reducing the number of code codes can be obtained. Therefore, it is desirable to take measures to promote the integration processing as much as possible. Therefore, when distributing and arranging a plurality of representative code codes on a plurality of tracks, the distribution order is adjusted so that the probability that the representative code codes arranged adjacently on the same track satisfy the similar condition becomes high. Preferably, it is adjusted. More specifically, the code codes may be sorted based on the frequency and then stored in each track. For example, as shown in FIG. 3B, when three pieces of code data are distributed to three tracks T1, T2, and T3,
If the distribution method is determined so that the lowest frequency among the three is stored in the track T1, the next lowest frequency is stored in the track T2, and the highest frequency is stored in the track T3, the frequency is completely reduced. It is considered that the probability of occurrence of notes to be integrated is improved as compared to the case where notes are unrelatedly distributed.

【００６５】また、図１８の例に示すように、信号区間
の再編成処理を行うと、符号コードの統合処理を更に促
進させることができる。たとえば、図１８(a) に示すよ
うに、１つのトラック上に５つの符号コード（ノートナ
ンバーのみ示す）ｎ３，ｎ１，ｎ２，ｎ１，ｎ３が配置
されていた場合を考える。ここでは、矩形で示された各
符号コードの横幅が当該符号コードの信号区間長を示
し、高さがその信号強度を示している。ここでは、次の
４つの段階〜により、信号区間の再編成を行ってい
る。Further, as shown in the example of FIG. 18, when the signal section is rearranged, the code code integration processing can be further promoted. For example, assume that five code codes n3, n1, n2, n1, and n3 are arranged on one track as shown in FIG. 18A. Here, the width of each code code represented by a rectangle indicates the signal section length of the code code, and the height indicates the signal strength. Here, signal sections are rearranged in the following four stages.

【００６６】段階：信号強度が所定のレベル以下で、
かつ、信号区間長が所定の長さ以下であるような符号コ
ードを削除する。具体的には、図１８(a) における第３
番目の符号コードｎ２がこの条件に該当したとすれば、
これを削除することにより図１８(b) に示すような状態
になる。Step: When the signal strength is below a predetermined level,
Further, a code code whose signal section length is equal to or less than a predetermined length is deleted. Specifically, the third in FIG.
Assuming that the nth code code n2 meets this condition,
By deleting this, the state becomes as shown in FIG.

【００６７】段階：個々の符号コードの信号区間長
を、隣接する符号コードに重ならない範囲内で、所定の
長さだけ右方に延長する。具体的には、図１８(b) に示
す４つの符号コードの信号区間長が延長され、図１８
(c) に示すような状態になる。Step: The signal section length of each code code is extended to the right by a predetermined length within a range not overlapping with adjacent code codes. Specifically, the signal section lengths of the four code codes shown in FIG.
The state is as shown in (c).

【００６８】段階：隣接配置された符号コードが所定
の類似条件を満たしていれば、これを統合する。これは
上述した統合処理であり、具体的には、図１８(c) にお
ける第２番目の符号コードｎ１と第３番目の符号コード
ｎ１とが統合され、図１８(d) に示すように、両者を連
結した信号区間を有する統一符号コードｎ１が作成され
る。Step: If the code codes arranged adjacently satisfy a predetermined similarity condition, they are integrated. This is the above-described integration process. Specifically, the second code code n1 and the third code code n1 in FIG. 18C are integrated, and as shown in FIG. A unified code code n1 having a signal section connecting the two is created.

【００６９】段階：信号区間長が所定の長さ以下であ
るような符号コードを削除する。ここでは、この基準と
なる所定の長さを段階の所定の長さよりも大きく設定
しているため、図１８(d) に示す第１番目の符号コード
ｎ３が削除され、最終的に図１８(e) に示すような状態
になる。Step: A code code whose signal section length is equal to or less than a predetermined length is deleted. Here, since the predetermined length serving as the reference is set to be longer than the predetermined length of the step, the first code code n3 shown in FIG. 18 (d) is deleted, and finally, FIG. e) The state shown in (e) is reached.

【００７０】以上のような信号区間の再編成処理を行う
ことにより、最終的に２つの符号コードのみが残ったこ
とになる。By performing the above-described signal section rearrangement processing, only two code codes are finally left.

【００７１】[0071]

【発明の効果】以上のとおり本発明に係る符号化方法に
よれば、ＭＩＤＩデータのような非線形な符号データへ
の変換を効率よく行うことが可能になる。As described above, according to the encoding method of the present invention, it is possible to efficiently perform conversion into nonlinear code data such as MIDI data.

[Brief description of the drawings]

【図１】本発明に係る音響信号の符号化方法の基本原理
を示す図である。FIG. 1 is a diagram showing a basic principle of an audio signal encoding method according to the present invention.

【図２】図１(c) に示す強度グラフに基いて作成された
符号コードを示す図である。FIG. 2 is a diagram showing a code generated based on the intensity graph shown in FIG. 1 (c).

【図３】時間軸上に部分的に重複するように単位区間設
定を行うことにより作成された符号コードを示す図であ
る。FIG. 3 is a diagram showing code codes created by performing unit section settings so as to partially overlap on the time axis.

【図４】時間軸上に部分的に重複するような単位区間設
定の具体例を示す図である。FIG. 4 is a diagram showing a specific example of unit section setting that partially overlaps on a time axis.

【図５】周波数軸を線形尺度で表示したフーリエスペク
トルの一例を示すグラフである。FIG. 5 is a graph showing an example of a Fourier spectrum in which a frequency axis is displayed on a linear scale.

【図６】周波数軸を対数尺度で表示したフーリエスペク
トルの一例を示すグラフである。FIG. 6 is a graph showing an example of a Fourier spectrum in which a frequency axis is displayed on a logarithmic scale.

【図７】周波数軸を対数尺度で表示したフーリエスペク
トルとノートナンバーとの対応関係を示すグラフであ
る。FIG. 7 is a graph showing a correspondence relationship between a Fourier spectrum in which a frequency axis is displayed on a logarithmic scale and a note number.

【図８】フーリエスペクトルを得る演算のための諸設定
を示す図である。FIG. 8 is a diagram showing various settings for calculating a Fourier spectrum.

【図９】周波数軸上に定義されたＭ個の測定ポイントに
ついて求められたスペクトル強度を示すグラフである。FIG. 9 is a graph showing spectrum intensities obtained for M measurement points defined on a frequency axis.

【図１０】フーリエ変換を用いてフーリエスペクトルを
得るための第１の演算手法を示す図である。FIG. 10 is a diagram illustrating a first calculation method for obtaining a Fourier spectrum using a Fourier transform.

【図１１】所定の周波数ｆ（ｍ）におけるスペクトル強
度Ｓ（ｍ）を求めるための基本式を説明する図である。FIG. 11 is a diagram illustrating a basic expression for obtaining a spectrum intensity S (m) at a predetermined frequency f (m).

【図１２】所定の周波数ｆ（ｍ）におけるスペクトル強
度Ｓ（ｍ）を求めるための第１の式を説明する図であ
る。FIG. 12 is a diagram illustrating a first equation for determining a spectrum intensity S (m) at a predetermined frequency f (m).

【図１３】線形尺度の周波数軸上に等間隔に定義された
測定ポイントの周波数ｆ（ｍ）の具体的な値を示す図表
である。FIG. 13 is a table showing specific values of the frequency f (m) of the measurement points defined at regular intervals on the frequency axis of the linear scale.

【図１４】対数尺度の周波数軸上に等間隔に定義された
測定ポイントの周波数ｆ（ｍ）の具体的な値を示す図表
である。FIG. 14 is a table showing specific values of the frequency f (m) of measurement points defined at equal intervals on the frequency axis of a logarithmic scale.

【図１５】フーリエ変換を用いてフーリエスペクトルを
得るための第２の演算手法を示す図である。FIG. 15 is a diagram illustrating a second calculation method for obtaining a Fourier spectrum using Fourier transform.

【図１６】所定の周波数ｆ（ｍ）におけるスペクトル強
度Ｓ（ｍ）を求めるための第２の式を説明する図であ
る。FIG. 16 is a diagram illustrating a second equation for obtaining a spectrum intensity S (m) at a predetermined frequency f (m).

【図１７】単位区間の統合処理により符号データの量を
削減した例を示す図である。FIG. 17 is a diagram illustrating an example in which the amount of code data is reduced by unitary unit integration processing.

【図１８】信号区間再編成処理により符号データの量を
削減した例を示す図である。FIG. 18 is a diagram illustrating an example in which the amount of code data is reduced by signal section reorganization processing.

[Explanation of symbols]

Ａ…複素強度Ａ（ｈ＋ｋ）…基準時刻ｔ０から数えて第（ｈ＋ｋ）番
目のサンプルの振幅値ｄ１〜ｄ５…単位区間Ｅ…実効強度ｅ（ｉ，ｊ）…符号コードｎ（ｉ，ｊ）の実効強度Ｆ…サンプリング周波数ｆ…周波数ｆ（ｍ）…第ｍ番目の測定ポイントの周波数ｈ…第ｉ番目の単位区間の区間開始時刻ｔｓと基準時刻
ｔ０との間に含まれるサンプル数Ｋ…１単位区間内のサンプル数ｋ…１単位区間内の着目サンプル番号Ｌ…単位区間の区間長 ΔＬ…オフセット長Ｍ…測定ポイントの数ｍ…測定ポイントの番号（ｍ＝０，１，２，…，Ｍ−
１）ｎ，ｎ１，ｎ２，ｎ３…ノートナンバーｎ（ｉ，ｊ）…単位区間ｄｉについて抽出された第ｊ番
目の符号コードＳ（ｍ）…第ｍ番目の測定ポイントにおけるスペクトル
強度Ｔ１〜Ｔ３…トラックｔ１〜ｔ６…時刻ｔ０…基準時刻ｔｅ…第ｉ番目の単位区間ｄｉの区間終了時刻ｔｓ…第ｉ番目の単位区間ｄｉの区間開始時刻 Δｔｈ，Δｔｋ…時間幅A: complex intensity A (h + k): amplitude value of the (h + k) th sample counted from the reference time t0 d1 to d5: unit interval E: effective intensity e (i, j) code code n (i, j) F: Sampling frequency f: Frequency f (m): Frequency of the m-th measurement point h: Number of samples included between the section start time ts of the i-th unit section and the reference time t0 K: Number of samples in one unit section k: Sample number of interest in one unit section L: Section length of unit section ΔL: Offset length M: Number of measurement points m: Number of measurement points (m = 0, 1, 2,...) , M-
1) n, n1, n2, n3... Note number n (i, j)... J-th code code extracted for unit section di S (m)... Spectrum intensity at m-th measurement point T1 to T3. Tracks t1 to t6 time t0 reference time te section end time of i-th unit section di ts section start time of i-th unit section di Δth, Δtk time width

Claims

[Claims]

1. An encoding method for encoding an audio signal given as a time-series intensity signal, wherein a plurality of unit intervals are set on a time axis of the audio signal to be encoded. A code definition step of discretely defining a plurality of M measurement points on a frequency axis and defining a total of M code codes corresponding to the M measurement points, respectively, An intensity calculation step of obtaining a spectrum intensity of a frequency component corresponding to the M measurement points included in the sound signal in the unit section; and an individual unit section based on the spectrum intensity obtained in the intensity calculation step. In each case, P representative code codes representing the unit section are extracted from the M total code codes, and the extracted representative code codes and their spectrum intensities are extracted. I, the coding method of the audio signal, characterized in that it has a, a coding step of representing the acoustic signals of the individual unit sections.

2. The encoding method according to claim 1, wherein a plurality of M measurement points are discretely defined in the code definition step so as to be equally spaced on a frequency axis of a logarithmic scale. A coding method of an acoustic signal, characterized in that:

3. The encoding method according to claim 2, wherein a plurality of M code codes are defined as MIDI code in the code definition step.
Using the note number used in the data, in the encoding stage, the audio signal of each unit section is extracted as a representative code, the note number extracted as a representative code, the velocity determined based on the spectrum intensity, and the unit A method for encoding an audio signal, characterized in that the audio signal is represented by MIDI-format code data including data indicating the delta time determined based on the length of the section.

4. The encoding method according to claim 1, wherein a spectrum intensity S (m) at an m-th measurement point corresponding to the frequency f (m) is calculated in the intensity calculation step. In this case, an audio signal encoding method is characterized in that an arithmetic operation for obtaining a correlation with M sine functions and cosine functions having frequencies corresponding to the respective measurement points is performed.

5. The encoding method according to claim 1, wherein a weighting function defining weighting over the section length of the unit section is prepared in the intensity calculation step, and the sound signal in the unit section is provided. A sound signal encoding method, wherein a spectrum intensity is obtained by multiplying the weight function.

6. The encoding method according to claim 1, wherein in the section setting step, setting is performed such that adjacent unit sections partially overlap on a time axis. A method for encoding an audio signal.

7. The encoding method according to claim 1, wherein the audio signal to be encoded is sampled at a predetermined sampling frequency F, and the amplitude value of the x-th sample is represented by A.
(X) is taken as sound data, and each unit section is set for the taken sound data. In the intensity calculation stage, for a unit section starting from the h-th sample and including a total of K samples, Frequency f
When calculating the spectrum intensity S (m) at the m-th measurement point corresponding to (m), a predetermined weight function W
Using (k), S (m) = (1 / K) Σ _{k = 0 to (K−1)} (W
(K) · A (h + k) · exp (−j2πf (m) ·
(H + k) / F)) An audio signal encoding method characterized by using the following formula:

8. The encoding method according to claim 1, wherein the audio signal to be encoded is sampled at a predetermined sampling frequency F, and the amplitude value of the x-th sample is set to A.
(X) is taken as sound data, and each unit section is set for the taken sound data. In the intensity calculation stage, for a unit section starting from the h-th sample and including a total of K samples, Frequency f
When calculating the spectrum intensity S (m) at the m-th measurement point corresponding to (m), a predetermined weight function W
Using (k), S (m) = (1 / K) Σ _{k = 0 to (K−1)} (W
(K) · A (h + k) · exp (−j2πf (m) · k
/ F)) An audio signal encoding method characterized by using the following formula:

9. The encoding method according to claim 1, wherein in the encoding step, a plurality of P representative code codes extracted for each unit section are distributed and arranged on a plurality of tracks. ,
When representative code codes arranged adjacently on the same track satisfy a predetermined similarity condition, a process of integrating the representative code codes arranged adjacently into a single representative code code is performed. Encoding method of the audio signal to be encoded.

10. The encoding method according to claim 9, wherein when a plurality of P representative code codes are distributed and arranged on a plurality of tracks, the representative code codes arranged adjacently on the same track are different. A method for encoding an audio signal, comprising adjusting an order of distribution so as to increase the probability of satisfying a similar condition.

11. A computer-readable recording medium on which a program for encoding an audio signal for executing the encoding method according to claim 1 is recorded.