JP3894722B2

JP3894722B2 - Stereo audio signal high efficiency encoding device

Info

Publication number: JP3894722B2
Application number: JP2000327885A
Authority: JP
Inventors: 清隆永井
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 2000-10-27
Filing date: 2000-10-27
Publication date: 2007-03-22
Anticipated expiration: 2020-10-27
Also published as: JP2002132295A

Description

【０００１】
【発明の属する技術分野】
本発明は、ステレオオーディオ信号をブロックに分割して高能率で符号化するステレオオーディオ信号高能率符号化装置に関するものである。
【０００２】
【従来の技術】
近年、オーディオ信号に対する高能率符号化方式として、変換符号化を利用した方式が広く用いられており、この変換符号化を利用した方式としては、ＭＰＥＧ２のＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）やドルビーディジタルが挙げられる。また、他の高能率符号化方式として、サブバンドフィルタと変換符号化とを組み合わせたハイブリッド符号化もあり、このハイブリッド符号化を利用した方式としては、ＭＰＥＧ１およびＭＰＥＧ２のレーヤ３が挙げられる。
【０００３】
以上の方式のうち変換符号化を利用して、ステレオオーディオ信号に対し、高能率に符号化する従来のステレオオーディオ信号高能率符号化装置について、以下に説明する。
【０００４】
図６は変換符号化を利用したステレオオーディオ信号高能率符号化装置の全体構成を示すブロック図である。図６において、６００と６０１は周波数変換部、６１０と６１１はブロック長決定部、６２０はジョイントステレオ信号生成部、６３０は量子化及び符号化部である。
【０００５】
以上のように構成されたステレオオーディオ信号高能率符号化装置について、その動作を以下に述べる。
まず、入力された左チャンネル（Ｌｃｈ）の時間軸のオーディオ信号は、周波数変換部６００において、ブロック長決定部６１０で決定された長さのブロックに分割され、直交変換されて周波数軸のスペクトル係数に変換される。
【０００６】
これに対し、ハイブリッド符号化方式の場合には、入力されたオーディオ信号はフィルタバンクでサブバンド信号に分割され、各サブバンド信号はブロック長決定部６１０で決定された長さのブロックに分割され、その後は同様に直交変換されて周波数軸のスペクトル係数に変換される。
【０００７】
この場合の直交変換としては、主としてＭＤＣＴ（変形離散コサイン変換）やＦＦＴ（高速フーリエ変換）が用いられる。
同様に、入力された右チャンネル（Ｒｃｈ）の時間軸のオーディオ信号は、周波数変換部６０１において、ブロック長決定部６１１で決定された長さのブロックに分割され、直交変換されて周波数軸のスペクトル係数に変換される。
【０００８】
ブロック長決定部６１０、６１１では、それぞれのチャンネルの信号に基づき周波数変換時のブロックの時間長（ブロック長）を決定する。また、ブロック長決定部６１０、６１１では、それぞれのチャンネルの信号の変化に応じてブロック長すなわち直交変換長を変化させることにより、プリエコーと呼ばれる疑似信号による音質の劣化を防止する。
【０００９】
図７は従来のステレオオーディオ信号高能率符号化装置におけるブロック長とプリエコーの関係を示す説明図である。同図では、直交変換としてＭＤＣＴを用いたときの図であり、隣接ブロックは５０％オーバーラップしている。
【００１０】
なお、図７（ａ）は入力信号波形を、図７（ｂ）は図７（ａ）の入力信号を長いブロック長を使って符号化・復号化した信号波形と変換に用いた窓の波形を、図７（ｃ）は図７（ａ）の入力信号を短いブロック長を使って符号化・復号化した信号波形と変換に用いた窓の波形を示す。また、図７（ｂ）でＴＬは長いブロック長の時間を、図７（ｃ）でＴＳは短いブロック長の時間を示す。図７ではＴＳはＴＬの１／４の時間である。
【００１１】
図７（ａ）に示すような急激な立ち上がりを含む信号を、図７（ｂ）のような長いブロック長で変換符号化すると、振幅の大きな部分の引き起こす量子化ノイズが振幅の小さい部分に広がり、疑似信号を発生する。一方、図７（ｃ）に示すように短いブロック長で変換符号化すると、振幅の大きな部分の引き起こす量子化ノイズは短いブロックの中に閉じ込められる。
【００１２】
大きな振幅の信号による量子化ノイズは変換ブロック全体に発生するが、聴覚の前向性マスキング（フォワードマスキング）の方が後向性マスキング（バックワードマスキング）より作用を及ぼす時間が長いので、大きな信号の後のノイズは大きな信号の前に発生するノイズと比較して知覚されにくい。
【００１３】
大きな信号の前に発生するノイズは本来の信号が聞こえる前に聞こえるのでプリエコーと呼ばれ、品質を大きく劣化させる。また、大きな信号の後に発生するノイズはポストエコーと呼ばれる。
【００１４】
したがって、変換符号化方式やハイブリッド符号化方式では、急激な立ち上がり信号に対しては、短いブロック長を選択することによりプリエコーを抑圧する。また、急峻な立下り信号に対して短いブロック長を選択することによりポストエコーを抑圧する。ただし、前述のようにポストエコーはプリエコーより聞こえにくいので、ポストエコーを抑圧しないこともある。
【００１５】
従来のステレオオーディオ信号高能率符号化装置におけるブロック長決定部６１０、６１１としては、例えば特開平３−２６３９２６号公報に記載されたものが知られている。
【００１６】
図８は従来のステレオオーディオ信号高能率符号化装置における左チャンネルブロック長決定部６１０と右チャンネルブロック長決定部６１１の構成を示すブロック図である。図８において、８００と８０１はセグメント信号レベル算出器、８１０と８１１はセグメント信号レベルメモリ、８２０と８２１は信号レベル変化検出器、８３０と８３１はブロック長判定器である。左チャンネルブロック長決定部６１０は、セグメント信号レベル算出器８００、セグメント信号レベルメモリ８１０、信号レベル変化検出器８２０、ブロック長判定器８３０から構成され、また、右チャンネルブロック長決定部６１１は、セグメント信号レベル算出器８０１、セグメント信号レベルメモリ８１１、信号レベル変化検出器８２１、ブロック長判定器８３１から構成される。
【００１７】
なお、左チャンネルブロック長決定部６１０と右チャンネルブロック長決定部６１１の構成と動作は同一であるので、以下、左チャンネルブロック長決定部６１０の動作について述べ、右チャンネルブロック長決定部６１１の動作については説明を省略する。
【００１８】
まず、入力された左チャンネルオーディオ信号は、セグメント信号レベル算出器８００で、最も短いブロックより小さい時間のセグメントに分割され、各セグメントの信号レベルをセグメント内の信号の２乗値の和、すなわちエネルギで算出する。
【００１９】
セグメント信号レベルメモリ８１０では、セグメント信号レベル算出器８００で算出された各セグメントの信号レベルを記憶する。信号レベル変化検出器８２０では、セグメント信号レベルメモリ８１０から読み出したセグメントの信号レベルを用いて、隣接するセグメントの信号レベルの比を求めて出力する。
【００２０】
ブロック長判定器８３０では、信号レベル変化検出器８２０からの信号レベルの比が閾値を越えるときには、短いブロック長を表す信号を出力し、そうでないときには長いブロック長を表す信号を出力する。
【００２１】
以上のようにして、信号の急激な上昇を検出したときには、短いブロック長を表す信号を出力することにより、プリエコーを抑圧することができる。
図６のジョイントステレオ信号生成部６２０では、ブロック長、スペクトル係数を入力として、ジョイントステレオ符号化に必要なジョイントステレオ信号を生成する。ここで、ジョイントステレオ信号とは、ミッド／サイドステレオ（和差信号）符号化に必要な左チャンネルと右チャンネルの周波数スペクトルの和信号と差信号、あるいはインテンシティステレオ符号化に必要な左チャンネルと右チャンネルの周波数スペクトルの和信号のことである。
【００２２】
図９は従来のステレオオーディオ信号高能率符号化装置におけるミッド／サイドステレオ符号化の説明図である。図９に示すように、ミッド／サイドステレオ符号化とは、左右のチャンネルの周波数スペクトルを直接符号化する代わりに、左右のチャンネルの周波数スペクトルの和の１／２の信号（ミッド信号、もしくは和信号）と、差の１／２の信号（サイド信号、もしくは差信号）を符号化するものである。
【００２３】
ミッド／サイドステレオでは、図９に示すように、左右両チャンネルの周波数スペクトルに類似性がある場合に、左／右周波数スペクトルを直接符号化するよりもミッド／サイド周波数スペクトルを符号化する方が、周波数スペクトルを符号化するのに必要なビット数は少なくて済む。以上のようにして、ミッド／サイドステレオ符号化を適用することにより、符号化効率を改善し、かつ音質を向上することができる。
【００２４】
また、インテンシティステレオとは、所定の周波数（通常３ｋＨｚから６ｋＨｚ）以上では、聴覚的には、スペクトルの微細構造よりもスペクトルのエンベロープの方が重要であることを利用して、上記の所定の周波数以上では、周波数スペクトルの情報としては左右のチャンネルの周波数スペクトルの和信号のみを符号化し、エンベロープ情報のみ左右のチャンネル別々に符号化することにより、符号化効率を改善するものである。
【００２５】
エンベロープ情報としては、和信号と各チャンネルの信号のエネルギーの比を送る。インテンシティステレオ符号化では、ミッド／サイドステレオ符号化よりも高い符号化効率を実現できるが、スペクトルの微細構造が再現できないことにより、音質劣化が生じることがある。
【００２６】
図６の量子化及び符号化部６３０では、左右ステレオ信号あるいはジョイントステレオ信号生成部６２０からのジョイントステレオ信号に対して、聴覚モデルに基づいて、スペクトル係数のマスキングレベル、すなわち許容量子化ノイズレベルを算出し、算出された許容量子化ノイズレベルに基づいてスペクトル係数の量子化を行い、ハフマン符号化等の符号化処理を行い、高能率符号化データを出力する。
【００２７】
【発明が解決しようとする課題】
しかしながら上記のような従来のステレオオーディオ信号高能率符号化装置では、図８に示すように、Ｌｃｈブロック長決定部６１０およびＲｃｈブロック長決定部６１１のそれぞれにおいて、入力信号に対するブロック長判定を各チャンネル毎に独立して行っているため、例えば左右のチャンネルで入力信号に対して判定したブロック長が異なっている時には、量子化及び符号化部６３０において、ミッド／サイドステレオ符号化やインテンシティステレオ符号化のような符号化効率の高いジョイントステレオ符号化が適用できなくなり、この場合には、ジョイントステレオ符号化を適用した場合と比較して、音質が劣化することがあるという問題点を有していた。
【００２８】
本発明は、上記従来の問題点を解決するもので、ステレオオーディオ信号を、ジョイントステレオ符号化方式に対して、従来にくらべてより良好に適用させることができ、その適用によってステレオオーディオ信号に対する符号化効率を改善して、この符号化信号に基づいて得られるステレオオーディオ信号の音質を向上することができるステレオオーディオ信号高能率符号化装置を提供する。
【００２９】
【課題を解決するための手段】
上記課題を解決するために本発明のステレオオーディオ信号高能率符号化装置は、ステレオオーディオ信号を入力信号として、各チャンネルのオーディオ信号をブロックに分割し、そのブロック長に基づいてジョイントステレオ符号化により高能率に符号化するステレオオーディオ信号高能率符号化装置であって、前記各チャンネルのオーディオ信号を所定の時間幅毎のセグメントに分割し、そのセグメントの信号レベルを算出する手段と、前記セグメントの信号レベルの変化から前記各チャンネル毎に独立にオーディオ信号の急激な上昇あるいは下降を検出する手段と、前記セグメントの信号レベルの変化量に基づいて、前記ジョイントステレオ符号化を適用しない場合に対応するブロックの時間長である単一チャンネルブロック長を、前記各チャンネルのオーディオ信号に対して独立に算出する手段と、各チャンネルの前記単一チャンネルブロック長が異なる場合には、各チャンネルのブロックの時間長を、共に前記単一チャンネルブロック長の短い方の時間長に決定し、そうでない場合には、各チャンネルの前記単一チャンネルブロック長を、そのまま各チャンネルのブロックの時間長に決定する手段とを備えた構成としたことを特徴とする。
【００３０】
以上により、二つのチャンネルの単一チャンネルブロック長に基づき、両者を統合して各チャンネルのブロックの時間長を決定することができる。
【００３１】
【発明の実施の形態】
本発明の請求項１に記載のステレオオーディオ信号高能率符号化装置は、ステレオオーディオ信号を入力信号として、各チャンネルのオーディオ信号をブロックに分割し、そのブロック長に基づいてジョイントステレオ符号化により高能率に符号化するステレオオーディオ信号高能率符号化装置であって、前記各チャンネルのオーディオ信号を所定の時間幅毎のセグメントに分割し、そのセグメントの信号レベルを算出する手段と、前記セグメントの信号レベルの変化から前記各チャンネル毎に独立にオーディオ信号の急激な上昇あるいは下降を検出する手段と、前記セグメントの信号レベルの変化量に基づいて、前記ジョイントステレオ符号化を適用しない場合に対応するブロックの時間長である単一チャンネルブロック長を、前記各チャンネルのオーディオ信号に対して独立に算出する手段と、各チャンネルの前記単一チャンネルブロック長が異なる場合には、各チャンネルのブロックの時間長を、共に前記単一チャンネルブロック長の短い方の時間長に決定し、そうでない場合には、各チャンネルの前記単一チャンネルブロック長を、そのまま各チャンネルのブロックの時間長に決定する手段とを備えた構成とする。
【００３３】
これらの構成によると、二つのチャンネルの単一チャンネルブロック長に基づき、両者を統合して各チャンネルのブロックの時間長を決定する。
請求項２に記載のステレオオーディオ信号高能率符号化装置は、ステレオオーディオ信号を入力信号として、各チャンネルのオーディオ信号をブロックに分割し、そのブロック長に基づいてジョイントステレオ符号化により高能率に符号化するステレオオーディオ信号高能率符号化装置であって、前記各チャンネルのオーディオ信号を所定の時間幅毎のセグメントに分割し、そのセグメントの信号レベルを算出する手段と、前記セグメントの信号レベルの変化から前記各チャンネル毎に独立にオーディオ信号の急激な上昇あるいは下降を検出する手段と、前記セグメントの信号レベルの変化量に基づいて、前記ジョイントステレオ符号化を適用しない場合に対応するブロックの時間長である単一チャンネルブロック長を、前記各チャンネルのオーディオ信号に対して独立に算出する手段と、前記各チャンネル間で、前記セグメントの信号レベルの変化を比較して、前記信号レベルの変化の類似性を判定する手段と、各チャンネルの前記信号レベルの変化に類似性があり、かつ各チャンネルの前記単一チャンネルブロック長が異なる場合には、各チャンネルのブロックの時間長を、共に前記単一チャンネルブロック長の短い方の時間長に決定し、そうでない場合には、各チャンネルの前記単一チャンネルブロック長を、そのまま各チャンネルのブロックの時間長に決定する手段とを備えた構成とする。
【００３４】
請求項３に記載のステレオオーディオ信号高能率符号化装置は、請求項２記載の各チャンネル間で、それらのセグメントの信号レベルが最も急激に変化したときの信号レベルの変化量と時間の違いが、所定の範囲内にある場合に、各チャンネルの信号レベルの変化に類似性があると判定する構成とする。
【００３６】
これらの構成によると、二つのチャンネルの信号レベルの変化の類似性を考慮して、この信号レベルの変化の類似性により、ジョイントステレオ符号化が効率的に動作することが期待できる場合においてのみ、二つのチャンネルの単一チャンネルブロック長に基づいて各チャンネルのブロックの時間長を決定する。
【００３７】
請求項４に記載のステレオオーディオ信号高能率符号化装置は、ステレオオーディオ信号を入力信号として、各チャンネルのオーディオ信号をブロックに分割し、そのブロック長に基づいてジョイントステレオ符号化により高能率に符号化するステレオオーディオ信号高能率符号化装置であって、前記各チャンネルのオーディオ信号を所定の時間幅毎のセグメントに分割し、そのセグメントの信号レベルを算出する手段と、前記セグメントの信号レベルの変化から前記各チャンネル毎に独立にオーディオ信号の急激な上昇あるいは下降を検出する手段と、前記セグメントの信号レベルの変化量に基づいて、前記ジョイントステレオ符号化を適用しない場合に対応するブロックの時間長である単一チャンネルブロック長を、前記各チャンネルのオーディオ信号に対して独立に算出する手段と、前記各チャンネル間で、前記セグメントの信号レベルを比較して、前記信号レベルの類似性を判定する手段と、各チャンネルの前記信号レベルに類似性があり、かつ各チャンネルの前記単一チャンネルブロック長が異なる場合には、各チャンネルのブロックの時間長を、共に前記単一チャンネルブロック長の短い方の時間長に決定し、そうでない場合には、各チャンネルの前記単一チャンネルブロック長を、そのまま各チャンネルのブロックの時間長に決定する手段とを備えた構成とする。
【００３８】
請求項５に記載のステレオオーディオ信号高能率符号化装置は、請求項４記載の各チャンネル間の信号レベルの類似性を、各チャンネルで対応するセグメントの信号レベルの和と、前記信号レベルの差との比を用いて判定する構成とする。
【００４０】
これらの構成によると、二つのチャンネルの信号レベルの類似性を考慮して、この信号レベルの類似性により、ジョイントステレオ符号化が効率的に動作することが期待できる場合においてのみ、二つのチャンネルの単一チャンネルブロック長に基づいて各チャンネルのブロックの時間長を決定する。
【００４４】
以下、本発明の一実施の形態を示すステレオオーディオ信号高能率符号化装置について、図面を参照しながら具体的に説明する。
図１は本実施の形態のステレオオーディオ信号高能率符号化装置の全体構成を示すブロック図である。図１において、１００と１０１は周波数変換部、１１０は統合ブロック長決定部、１２０はジョイントステレオ信号生成部、１３０は量子化及び符号化部である。
【００４５】
以上のように構成されたステレオオーディオ信号高能率符号化装置について、その動作を以下に述べる。
入力された左チャンネル（Ｌｃｈ）の時間軸のオーディオ信号は、周波数変換部１００において、統合ブロック長決定部１１０で決定された長さのブロックに分割され、直交変換されて周波数軸のスペクトル係数に変換される。同様に、入力された右チャンネル（Ｒｃｈ）の時間軸のオーディオ信号は、周波数変換部１０１において、統合ブロック長決定部１１０で決定された長さのブロックに分割され、直交変換されて周波数軸のスペクトル係数に変換される。本実施の形態では、直交変換としてＭＤＣＴを用いる。
【００４６】
統合ブロック長決定部１１０では、左右両方のチャンネルの信号に基づいて、左チャンネルと右チャンネルの周波数変換時のブロックの時間長を決定する。また、統合ブロック長決定部１１０では、ジョイントステレオ符号化の適用を考慮してブロック長すなわち直交変換長を決定し、プリエコーあるいはポストエコーと呼ばれる疑似信号による音質の劣化を防止する。
【００４７】
なお、ハイブリッド符号化方式の場合には、入力されたオーディオ信号はフィルタバンク（図示せず）でサブバンド信号に分割され、各サブバンド信号は統合ブロック長決定部１１０で決定された長さのブロックに分割され、直交変換されてスペクトル係数に変換される。
【００４８】
ジョイントステレオ信号生成部１２０では、左右のチャンネルのブロック長およびスペクトル係数を入力として、量子化及び符号化部１３０におけるジョイントステレオ符号化に必要なジョイントステレオ信号を生成する。このジョイントステレオ符号化として、本実施の形態では、ミッド／サイドステレオ符号化とインテンシティステレオ符号化を用いる。
【００４９】
すなわち、ジョイントステレオ信号生成部１２０で、ミッド／サイドステレオ符号化を適用する場合には、左チャンネルと右チャンネルの周波数スペクトルの和信号と差信号を生成し、またインテンシティステレオ符号化を適用する場合には、左チャンネルと右チャンネルの周波数スペクトルの和信号のみを生成する。
【００５０】
量子化及び符号化部１３０では、左右ステレオ信号、あるいはジョイントステレオ信号生成部１２０からのジョイントステレオ信号に対して、聴覚モデルに基づいて、スペクトル係数のマスキングレベル、すなわち許容量子化ノイズレベルを算出し、算出された許容量子化ノイズレベルに基づいて、スペクトル係数の量子化を行い、ハフマン符号化等の符号化処理を行い、高能率符号化データを出力する。
【００５１】
以上のように、本発明の特徴である統合ブロック長決定部を用いた実施の形態のステレオオーディオ信号高能率符号化装置について、統合ブロック長決定部１１０の各種構成例を挙げて、以下に詳細に説明する。
（実施の形態１）
図２は本実施の形態１のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部の構成を示すブロック図である。図２において、２００と２０１はそれぞれ左と右の各チャンネルのセグメント信号レベル算出器、２１０と２１１はそれぞれ左と右の各チャンネルのセグメント信号レベルメモリ、２２０と２２１はそれぞれ左と右の各チャンネルの信号レベル変化検出器、２３０と２３１はそれぞれ左と右の各チャンネルのブロック長判定器、２４０はブロック長統合判定器である。
【００５２】
以上のように構成された本実施の形態１のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部について、その動作を以下に述べる。
左チャンネルに入力されたオーディオ信号は、セグメント信号レベル算出器２００で最も短いブロックの時間と同じかそれより小さい時間のセグメントに分割され、各セグメントの信号レベルがセグメント内の信号の２乗値の和で算出される。
【００５３】
図３に本実施の形態におけるブロックとセグメントとの関係を示す。同図で上から順に長いブロック、短いブロック、セグメントを表し、破線は相対時間関係を示す。本実施の形態では、直交変換としてＭＤＣＴを用いているため、各ブロックは５０％オーバーラップしている。
【００５４】
また、通常は、長いブロックは５１２サンプルから２０４８サンプルで、また短いブロックは６４から２５６サンプルで構成されるが、本実施の形態では、長いブロックは１０２４サンプル、短いブロックは２５６サンプル、セグメントは１２８サンプルで構成されている。すなわち、短いブロック長（ＴＳ）は長いブロック長（ＴＬ）の１／４の時間で、セグメント長（Ｔ）は短いブロック長（ＴＳ）の１／２の時間である。
【００５５】
なお、セグメント信号レベル算出器２００で算出されるセグメントの信号レベルとしては、ダイナミックレンジを小さくするため、あるいは処理量を減らすために、上記セグメント内の信号の２乗値の和の代わりに信号の絶対値の和、あるいは信号の絶対値の最大をもちいてもよい。また、ハイブリッド符号化方式の場合には、サブバンドフィルタ（帯域通過フィルタ）の出力であるサブバンド信号を、セグメント信号レベル算出器２００に入力する。
【００５６】
次に、セグメント信号レベルメモリ２１０では、セグメント信号レベル算出器２００で算出された各セグメントの信号レベルをメモリに記憶する。ここで、セグメントｉの信号レベルをＳ（ｉ）とする。
【００５７】
信号レベル変化検出器２２０では、信号の立ち上り検出するため、セグメント信号レベルメモリ２１０から読み出した信号レベルをもちいて、セグメントｉの信号レベルＳ（ｉ）に対する直前のセグメントの信号レベルＳ（ｉ−１）の比、すなわちＳ（ｉ）／Ｓ（ｉ−１）を求め、その値を出力する。
【００５８】
また、信号レベル検出器２２０では、信号の立下りを検出するため、セグメント信号レベルメモリ２１０から読み出した信号レベルをもちいて、４個のセグメントの信号レベルの和Ｔ（４×ｉ）を求め、隣接する４個のセグメント毎の信号レベルの和の比、すなわちＴ（４×ｉ）／Ｔ（４×ｉ＋４）の値を出力する。このように、信号の立下りの場合、４個のセグメントの信号レベルの和を用いるのは、前記したように聴覚の前向性マスキングの方が後向性マスキングより作用を及ぼす時間が長いことに基づく。
【００５９】
次にブロック長判定器２３０は、信号レベル変化検出器２２０からのＳ（ｉ）／Ｓ（ｉ−１）の値が閾値を越えるときには、短いブロック長を表す信号を出力し、そうでないときには長いブロック長を表す信号を出力する。
【００６０】
以上のようにして、信号の急激な上昇を検出したときには、短いブロック長を表す信号を出力することにより、プリエコーを抑圧することができる。さらに、ブロック長判定器２３０は、Ｔ（４×ｉ）／Ｔ（４×ｉ＋４）の値が閾値を越えるときにも、短いブロック長を表す信号を出力する。また、信号の急激な下降を検出したときには短いブロック長を表す信号を出力することにより、ポストエコーを抑圧することができる。
【００６１】
以上のようにブロック長判定器２３０は、左チャンネルに入力されたオーディオ信号に基づいて、ジョイントステレオ符号化を適用しないで符号化する場合に対応するブロックの時間長である単一チャンネルブロック長を、左チャンネルのオーディオ信号に対応させて算出し出力する。同様に、ブロック長判定器２３１は、右チャンネルに入力されたオーディオ信号に基づいて、ジョイントステレオ符号化を適用しないで符号化する場合に対応するブロックの時間長である単一チャンネルブロック長を、右チャンネルのオーディオ信号に対応させて算出し出力する。
【００６２】
ブロック長統合判定器２４０では、ブロック長判定器２３０からの左チャンネルの単一チャンネルブロック長と、ブロック長判定器２３１からの右チャンネルの単一チャンネルブロック長とに基づいて、右チャンネルと左チャンネルのそれぞれのブロック長を決定する。
【００６３】
この左右チャンネルの各ブロック長を決定する場合、ブロック長統合判定器２４０では、二つの単一チャンネルブロック長に基づき、二つの単一チャンネルブロック長が異なる場合には、両チャンネルのブロック長を共に、二つの単一チャンネルブロック長のうち短い方のブロック長に決定して出力する。また、二つの単一チャンネルブロック長が同一の場合には、各チャンネルの単一チャンネルブロック長を、そのまま各チャンネルのブロック長（各チャンネル間で同一）に決定して出力する。
【００６４】
以上のように、両方のチャンネルをそれらの短い側のブロック長とすることにより、プリエコーやポストエコーを抑圧し、符号化効率の高いジョイントステレオ符号化の適用により音質を向上することができる。
【００６５】
なお、信号レベルの変化の検出精度を高める目的で、セグメント信号レベル算出器２００とセグメント信号レベル算出器２０１の入力として、オーディオ信号を直接入力する代わりに、高域通過フィルタ（ＨＰＦ）あるいは帯域通過フィルタ（ＢＰＦ）を通過したオーディオ信号を用いてもよい。
【００６６】
以上のように本実施の形態では、ブロック長統合判定器２４０を設けることにより、二つのチャンネルの単一チャンネルブロック長が異なるときには、両方のチャンネルのブロック長を短いブロック長で統一することにより、ジョイントステレオ符号化を適用し易くし、ジョイントステレオ符号化による符号化効率の改善および音質の向上を実現することができる。
（実施の形態２）
図４は本実施の形態２のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部の構成を示すブロック図である。図４において、４００と４０１はそれぞれ左と右の各チャンネルのセグメント信号レベル算出器、４１０と４１１はそれぞれ左と右の各チャンネルのセグメント信号レベルメモリ、４２０と４２１はそれぞれ左と右の各チャンネルの信号レベル変化検出器、４３０と４３１はそれぞれ左と右の各チャンネルのブロック長判定器、４４０はブロック長統合判定器、４５０は信号レベル変化類似性判定器である。
【００６７】
図２に示した実施の形態１と図４に示した実施の形態２の構成の違いは、実施の形態２では、信号レベル変化類似性判定器４５０が追加されている点である。以上のように構成された本実施の形態２のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部について、その動作を以下に述べる。なお、セグメント信号レベル算出器４００と４０１、セグメント信号レベルメモリ４１０と４１１、信号レベル変化検出器４２０と４２１、ブロック長判定器４３０と４３１の各動作は、実施の形態１の対応するブロックと同一であるので、ここでの説明は省略する。
【００６８】
信号レベル変化検出器４２０と４２１は、それぞれ左と右の各チャンネルの信号レベルの上昇と下降の変化の度合いを示す値を出力するように構成されており、信号レベル変化類似性判定器４５０では、左チャンネルの信号レベル変化検出器４２０から出力される左チャンネルの信号レベルの上昇及び下降時の変化の度合いを示す値について、ブロック長決定時に考慮しなければならない時間の中での最大値とその最大値を与える時間を求めるとともに、右チャンネルについても、同様に、その信号レベル変化検出器４２１から出力される右チャンネルの信号レベルの上昇及び下降時の変化の度合いを示す値について、最大値とその最大値を与える時間を求める。
【００６９】
両方のチャンネルの上昇及び下降時の変化量の最大値とその最大値を与える時間が所定の範囲内にあるときには、両方のチャンネルの信号は類似性があると判定し、一方、そうでないときには、両方のチャンネルの信号は類似性がないと判定して、その結果を示す信号を出力する。
【００７０】
左チャンネルのブロック長判定器４３０は、ジョイントステレオ符号化を適用しないで符号化する場合に対応する左チャンネルの単一チャンネルブロック長を出力し、同様に、右チャンネルのブロック長判定器４３１は、ジョイントステレオ符号化を適用しないで符号化する場合に対応する右チャンネルの単一チャンネルブロック長を出力するように構成されており、ブロック長統合判定器４４０では、ブロック長判定器４３０からの左チャンネルの単一チャンネルブロック長とブロック長判定器４３１からの右チャンネルの単一チャンネルブロック長と信号レベル変化類似性判定器４５０からの類似性判定結果に基づいて、右チャンネルと左チャンネルのそれぞれにおいて、ブロックの時間長を決定し、その結果を示す各チャンネルのブロック長を示す信号を出力する。
【００７１】
ブロック長統合判定器４４０では、信号レベル変化類似性判定器４５０で類似性があると判定され、かつ二つの単一チャンネルブロック長が異なる場合には、両チャンネルのブロック長を共に、二つの単一チャンネルブロック長のうち短い方のブロックの時間長に決定して、その結果信号を出力する。上記以外の場合には、各チャンネルの単一チャンネルブロック長をそのまま各チャンネルのブロックの時間長に決定して結果を出力する。
【００７２】
以上のように二つのチャンネルの信号レベルの変化に類似性がある場合で、かつ二つのチャンネルの単一チャンネルブロック長が異なる場合には、両方のチャンネルのブロックを短い方の時間長とすることにより、プリエコーやポストエコーを抑圧し、符号化効率の高いジョイントステレオ符号化の適用により音質を向上することができる。
【００７３】
また、前記の類似性がない場合には、ジョイントステレオ符号化による符号化効率改善が期待されないので、それぞれのチャンネル信号に適したブロックの時間長で符号化する。
【００７４】
以上のように本実施の形態では、ブロック長統合判定器４４０と信号レベル変化類似性判定器４５０とを設けることにより、二つのチャンネルの信号レベルの変化に類似性があり、かつそれらの単一チャンネルブロック長が異なるときには、両方のチャンネルのブロックを短い時間長に統一することにより、ジョイントステレオ符号化を適用し易くし、ジョイントステレオ符号化による符号化効率の改善により音質を向上することができる。
（実施の形態３）
図５は本実施の形態３のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部の構成を示すブロック図である。図５において、５００と５０１はそれぞれ左と右の各チャンネルのセグメント信号レベル算出器、５１０と５１１はそれぞれ左と右の各チャンネルのセグメント信号レベルメモリ、５２０と５２１はそれぞれ左と右の各チャンネルの信号レベル変化検出器、５３０と５３１はそれぞれ左と右の各チャンネルのブロック長判定器、５４０はブロック長統合判定器、５５０は信号レベル類似性判定器である。
【００７５】
図２に示した実施の形態１と図５に示した実施の形態３の構成の違いは、実施の形態３では、信号レベル類似性判定器５５０が追加されている点である。
以上のように構成された本実施の形態３のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部について、その動作を以下に述べる。なお、セグメント信号レベル算出器５００と５０１、セグメント信号レベルメモリ５１０と５１１、信号レベル変化検出器５２０と５２１、ブロック長判定器５３０と５３１の各動作は、実施の形態１の対応するブロックと同一であるので、ここでの説明は省略する。
【００７６】
信号レベル類似性判定器５５０では、左チャンネルのセグメント信号レベルメモリ５１０と、右チャンネルのセグメント信号レベルメモリ５１１とから、それぞれのチャンネルの信号レベルを読み出して両者を比較することにより、両方のチャンネルの信号レベルの類似性を判定して、その結果信号を出力する。
【００７７】
すなわち、信号レベル類似性判定器５５０では、二つののチャンネルの対応する時間が同一のセグメントの信号レベルの和に対する信号レベルの差の絶対値の比が所定の閾値以下となるときには、二つのチャンネルの信号レベルは類似性があると判定し、そうでないときには、二つのチャンネルの信号は類似性がないと判定し、その結果信号を出力する。
【００７８】
左チャンネルのブロック長判定器５３０は、ジョイントステレオ符号化を適用しないで符号化する場合に対応する左チャンネルの単一チャンネルブロック長を出力し、同様に、右チャンネルのブロック長判定器５３１は、ジョイントステレオ符号化を適用しないで符号化する場合に対応する右チャンネルの単一チャンネルブロック長を出力するように構成されており、ブロック長統合判定器５４０では、ブロック長判定器５３０からの左チャンネルの単一チャンネルブロック長とブロック長判定器５３１からの右チャンネルの単一チャンネルブロック長と信号レベル類似性判定器５５０からの類似性判定結果に基づいて、右チャンネルと左チャンネルのそれぞれにおいて、ブロックの時間長を決定し、結果を示す各ｃｈのブロック長を示す信号を出力する。
【００７９】
ブロック長統合判定器５４０では、信号レベル類似性判定器５５０で類似性があると判定され、かつ二つの単一チャンネルブロック長が異なる場合には、両チャンネルのブロック長を共に、二つの単一チャンネルブロック長のうち短い方のブロックの時間長に決定して、その結果信号を出力する。上記以外の場合には、各チャンネルの単一チャンネルブロック長をそのまま各チャンネルのブロックの時間長に決定して結果を出力する。
【００８０】
以上のように二つのチャンネルの信号レベルに類似性がある場合で、かつ二つのチャンネルの単一チャンネルブロック長が異なる場合には、両方のチャンネルのブロック長を短い方の時間長とすることにより、プリエコーやポストエコーを抑圧し、符号化効率の高いジョイントステレオ符号化の適用により音質を向上することができる。
【００８１】
また、前記の類似性がない場合には、ジョイントステレオ符号化による符号化効率改善が期待されないので、それぞれのチャンネル信号に適したブロックの時間長で符号化する。
【００８２】
以上のように本実施の形態では、ブロック長統合判定器５４０と信号レベル類似性判定器５５０とを設けることにより、二つのチャンネルの信号レベルに類似性があり、かつそれらの単一チャンネルブロック長が異なるときには、両方のチャンネルのブロックを短い時間長で統一することにより、ジョイントステレオ符号化を適用し易くし、ジョイントステレオ符号化による符号化効率の改善により音質を向上することができる。
【００８３】
【発明の効果】
以上のように本発明によれば、二つのチャンネルの単一チャンネルブロック長に基づき、両者を統合して各チャンネルのブロックの時間長を決定することができる。
【００８４】
また、二つのチャンネルの信号レベルの変化の類似性を考慮して、この信号レベルの変化の類似性により、ジョイントステレオ符号化が効率的に動作することが期待できる場合においてのみ、二つのチャンネルの単一チャンネルブロック長に基づいて各チャンネルのブロックの時間長を決定することができる。
【００８５】
また、二つのチャンネルの信号レベルの類似性を考慮して、この信号レベルの類似性により、ジョイントステレオ符号化が効率的に動作することが期待できる場合においてのみ、二つのチャンネルの単一チャンネルブロック長に基づいて各チャンネルのブロックの時間長を決定することができる。
【００８６】
以上により、ステレオオーディオ信号を、ジョイントステレオ符号化方式に対して、従来にくらべてより良好に適用させることができ、その適用によってステレオオーディオ信号に対する符号化効率を改善して、この符号化信号に基づいて得られるステレオオーディオ信号の音質を向上することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態のステレオオーディオ信号高能率符号化装置の全体構成を示すブロック図
【図２】本発明の実施の形態１のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部の構成を示すブロック図
【図３】本発明の実施の形態におけるブロック長とセグメントとの関係説明図
【図４】本発明の実施の形態２のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部の構成を示すブロック図
【図５】本発明の実施の形態３のステレオオーディオ信号高能率符号化装置における統合ブロック長決定部の構成を示すブロック図
【図６】従来のステレオオーディオ信号高能率符号化装置の全体構成を示すブロック図
【図７】同従来例における動作を説明するためのブロック長とプリエコーとの関係説明図
【図８】同従来例における左右チャンネルのブロック長決定部の構成を示すブロック図
【図９】同従来例における動作を説明するためのミッド／サイドステレオ符号化の方式説明図
【符号の説明】
１００，１０１周波数変換部
１１０統合ブロック長決定部
１２０ジョイントステレオ信号生成部
１３０量子化及び符号化部
２００，２０１，４００，４０１，５００，５０１セグメント信号レベル算出器
２１０，２１１，４１０，４１１，５１０，５１１セグメント信号レベルメモリ
２２０，２２１，４２０，４２１，５２０，５２１信号レベル変化検出器
２３０，２３１，４３０，４３１，５３０，５３１ブロック長判定器
２４０，４４０，５４０ブロック長統合判定器
４５０信号レベル変化類似性判定器
５５０信号レベル類似性判定器[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a stereo audio signal high-efficiency encoding apparatus that divides a stereo audio signal into blocks and encodes them with high efficiency.
[0002]
[Prior art]
In recent years, as a high-efficiency encoding method for audio signals, a method using transform coding has been widely used. Examples of the method using transform coding include MPEG2 Advanced Audio Coding (AAC) and Dolby Digital. It is done. As another high-efficiency encoding scheme, there is hybrid encoding combining a subband filter and transform encoding. As a scheme using this hybrid encoding, there is a layer 3 of MPEG1 and MPEG2.
[0003]
A conventional stereo audio signal high-efficiency encoding apparatus that performs high-efficiency encoding on a stereo audio signal using transform encoding among the above methods will be described below.
[0004]
FIG. 6 is a block diagram showing the overall configuration of a stereo audio signal high-efficiency encoding apparatus using transform encoding. In FIG. 6, 600 and 601 are frequency conversion units, 610 and 611 are block length determination units, 620 is a joint stereo signal generation unit, and 630 is a quantization and encoding unit.
[0005]
The operation of the stereo audio signal high-efficiency encoding apparatus configured as described above will be described below.
First, the input left-channel (Lch) time axis audio signal is divided into blocks of the length determined by the block length determination unit 610 in the frequency conversion unit 600, orthogonally transformed, and the frequency axis spectral coefficient. Is converted to
[0006]
On the other hand, in the case of the hybrid coding method, the input audio signal is divided into subband signals by the filter bank, and each subband signal is divided into blocks having a length determined by the block length determination unit 610. Thereafter, orthogonal transformation is performed in the same manner, and the spectral coefficient is converted to the frequency axis.
[0007]
As orthogonal transform in this case, MDCT (modified discrete cosine transform) and FFT (fast Fourier transform) are mainly used.
Similarly, the input time channel audio signal of the right channel (Rch) is divided into blocks of the length determined by the block length determination unit 611 in the frequency conversion unit 601, and orthogonally converted to the frequency axis spectrum. Converted to a coefficient.
[0008]
Block length determination units 610 and 611 determine the time length (block length) of the block at the time of frequency conversion based on the signal of each channel. Also, the block length determination units 610 and 611 prevent deterioration of sound quality due to a pseudo signal called pre-echo by changing the block length, that is, the orthogonal transform length, according to the change of the signal of each channel.
[0009]
FIG. 7 is an explanatory diagram showing the relationship between the block length and the pre-echo in the conventional stereo audio signal high efficiency coding apparatus. In the figure, MDCT is used as orthogonal transform, and adjacent blocks overlap by 50%.
[0010]
7A shows an input signal waveform, FIG. 7B shows a signal waveform obtained by encoding and decoding the input signal shown in FIG. 7A using a long block length, and a window waveform used for conversion. FIG. 7C shows a signal waveform obtained by encoding and decoding the input signal of FIG. 7A using a short block length and a window waveform used for the conversion. In FIG. 7B, TL indicates a long block length time, and in FIG. 7C, TS indicates a short block length time. In FIG. 7, TS is 1/4 time of TL.
[0011]
When a signal including a sudden rise as shown in FIG. 7A is transcoded with a long block length as shown in FIG. 7B, the quantization noise caused by the large amplitude part spreads to the small amplitude part. , Generate a pseudo signal. On the other hand, when transform coding is performed with a short block length as shown in FIG. 7C, quantization noise caused by a portion with a large amplitude is confined in the short block.
[0012]
Quantization noise due to large amplitude signals occurs throughout the transform block, but the large amount of signal is due to the fact that auditory forward masking (forward masking) takes longer to act than backward masking (backward masking). The later noise is less perceptible than the noise that occurs before a large signal.
[0013]
Since noise generated before a large signal is heard before the original signal is heard, it is called a pre-echo and greatly deteriorates the quality. Noise generated after a large signal is called post-echo.
[0014]
Therefore, in the transform coding scheme and the hybrid coding scheme, the pre-echo is suppressed by selecting a short block length for a sudden rising signal. In addition, the post-echo is suppressed by selecting a short block length for the steep falling signal. However, since the post-echo is less audible than the pre-echo as described above, the post-echo may not be suppressed.
[0015]
As block length determination units 610 and 611 in a conventional stereo audio signal high-efficiency encoding device, for example, one described in Japanese Patent Laid-Open No. 3-263926 is known.
[0016]
FIG. 8 is a block diagram showing the configuration of the left channel block length determining unit 610 and the right channel block length determining unit 611 in the conventional stereo audio signal high efficiency encoding device. In FIG. 8, 800 and 801 are segment signal level calculators, 810 and 811 are segment signal level memories, 820 and 821 are signal level change detectors, and 830 and 831 are block length determiners. The left channel block length determination unit 610 includes a segment signal level calculator 800, a segment signal level memory 810, a signal level change detector 820, and a block length determiner 830. The right channel block length determination unit 611 includes a segment A signal level calculator 801, a segment signal level memory 811, a signal level change detector 821, and a block length determiner 831 are included.
[0017]
Since the configuration and operation of the left channel block length determination unit 610 and the right channel block length determination unit 611 are the same, the operation of the left channel block length determination unit 610 will be described below, and the operation of the right channel block length determination unit 611 will be described. Description of is omitted.
[0018]
First, the input left channel audio signal is segmented by a segment signal level calculator 800 into segments having a time shorter than the shortest block, and the signal level of each segment is summed with the sum of the square values of the signals in the segment, that is, energy. Calculate with
[0019]
The segment signal level memory 810 stores the signal level of each segment calculated by the segment signal level calculator 800. The signal level change detector 820 uses the signal level of the segment read from the segment signal level memory 810 to determine and output the ratio of the signal levels of adjacent segments.
[0020]
The block length determiner 830 outputs a signal representing a short block length when the ratio of the signal levels from the signal level change detector 820 exceeds a threshold value, and outputs a signal representing a long block length otherwise.
[0021]
As described above, when a sudden increase in signal is detected, a pre-echo can be suppressed by outputting a signal representing a short block length.
The joint stereo signal generation unit 620 in FIG. 6 generates a joint stereo signal necessary for joint stereo encoding with the block length and the spectrum coefficient as inputs. Here, the joint stereo signal is a sum signal and a difference signal of the frequency spectrum of the left channel and the right channel required for mid / side stereo (sum / difference signal) encoding, or a left channel required for intensity stereo encoding. This is the sum signal of the frequency spectrum of the right channel.
[0022]
FIG. 9 is an explanatory diagram of mid / side stereo coding in a conventional stereo audio signal high efficiency coding apparatus. As shown in FIG. 9, mid / side stereo encoding is not a direct encoding of the frequency spectrum of the left and right channels, but a signal half the sum of the frequency spectra of the left and right channels (mid signal or sum). Signal) and a signal of half the difference (side signal or difference signal).
[0023]
In the mid / side stereo, as shown in FIG. 9, when the frequency spectra of both the left and right channels are similar, it is better to encode the mid / side frequency spectrum than to directly encode the left / right frequency spectrum. The number of bits required to encode the frequency spectrum is small. By applying mid / side stereo coding as described above, coding efficiency can be improved and sound quality can be improved.
[0024]
Intensity stereo means that the above-mentioned predetermined stereophony is utilized by utilizing the fact that the spectral envelope is more important than the fine structure of the spectrum auditorily above a predetermined frequency (usually 3 kHz to 6 kHz). Above the frequency, only the sum signal of the frequency spectrum of the left and right channels is encoded as frequency spectrum information, and only the envelope information is encoded separately for the left and right channels, thereby improving the encoding efficiency.
[0025]
As envelope information, the ratio of the energy of the sum signal and the signal of each channel is sent. Intensity stereo coding can achieve higher coding efficiency than mid / side stereo coding, but sound quality may deteriorate due to the inability to reproduce the fine structure of the spectrum.
[0026]
In the quantization and encoding unit 630 of FIG. 6, the masking level of the spectral coefficient, that is, the allowable quantization noise level is set for the left and right stereo signal or the joint stereo signal from the joint stereo signal generation unit 620 based on the auditory model. The spectral coefficient is quantized based on the calculated allowable quantization noise level, encoding processing such as Huffman encoding is performed, and high-efficiency encoded data is output.
[0027]
[Problems to be solved by the invention]
However, in the conventional stereo audio signal high-efficiency encoding apparatus as described above, as shown in FIG. 8, in each of the Lch block length determination unit 610 and the Rch block length determination unit 611, block length determination for an input signal is performed for each channel. For example, when the block length determined for the input signal is different between the left and right channels, the quantization and encoding unit 630 performs mid / side stereo encoding or intensity stereo encoding. Joint stereo coding with high coding efficiency such as coding cannot be applied, and in this case, there is a problem that the sound quality may be deteriorated as compared with the case where joint stereo coding is applied. It was.
[0028]
The present invention solves the above-described conventional problems, and can apply a stereo audio signal to a joint stereo encoding method better than the conventional one. There is provided a stereo audio signal high-efficiency encoding device capable of improving the sound quality of a stereo audio signal obtained on the basis of the encoded signal by improving the conversion efficiency.
[0029]
[Means for Solving the Problems]
In order to solve the above problems, a stereo audio signal high-efficiency encoding device according to the present invention uses a stereo audio signal as an input signal, divides the audio signal of each channel into blocks, and performs joint stereo encoding based on the block length. A stereo audio signal high-efficiency encoding apparatus for encoding with high efficiency, the audio signal of each channel is divided into segments for each predetermined time width, and a signal level of the segment is calculated, For each channel from the change in signal level Independently A single channel block length which is a time length of a block corresponding to a case where the joint stereo coding is not applied based on means for detecting a sudden rise or fall of an audio signal and the amount of change in the signal level of the segment. , For each channel audio signal Independently When the single channel block length of each channel is different from the means for calculating, the time length of the block of each channel is determined to be the shorter time length of the single channel block length, otherwise Comprises a means for determining the single channel block length of each channel as the time length of the block of each channel as it is.
[0030]
As described above, based on the single channel block length of the two channels, the time length of the block of each channel can be determined by integrating both.
[0031]
DETAILED DESCRIPTION OF THE INVENTION
The stereo audio signal high-efficiency encoding apparatus according to claim 1 of the present invention uses the stereo audio signal as an input signal, divides the audio signal of each channel into blocks, and performs joint stereo encoding based on the block length. A stereo audio signal high-efficiency encoding apparatus that encodes efficiently, wherein the audio signal of each channel is divided into segments for each predetermined time width, and a signal level of the segment is calculated, and the signal of the segment For each channel from the level change Independently A single channel block length which is a time length of a block corresponding to a case where the joint stereo coding is not applied based on means for detecting a sudden rise or fall of an audio signal and the amount of change in the signal level of the segment. , For each channel audio signal Independently When the single channel block length of each channel is different from the means for calculating, the time length of the block of each channel is determined to be the shorter time length of the single channel block length, otherwise Comprises a means for determining the single channel block length of each channel as the time length of the block of each channel as it is.
[0033]
According to these structures, based on the single channel block length of two channels, both are integrated and the block length of each channel is determined.
The stereo audio signal high-efficiency encoding apparatus according to claim 2, using the stereo audio signal as an input signal, divides the audio signal of each channel into blocks, and performs high-efficiency encoding by joint stereo encoding based on the block length. A stereo audio signal high-efficiency encoding device for converting the audio signal of each channel into segments for each predetermined time width, calculating a signal level of the segment, and a change in the signal level of the segment To each channel Independently A single channel block length which is a time length of a block corresponding to a case where the joint stereo coding is not applied based on means for detecting a sudden rise or fall of an audio signal and the amount of change in the signal level of the segment. , For each channel audio signal Independently There is similarity between the means for calculating, the means for determining the similarity of the signal level change by comparing the change in the signal level of the segment between the channels, and the change in the signal level of each channel. And when the single channel block length of each channel is different, the time length of the block of each channel is determined to be the shorter time length of the single channel block length. And a means for determining the single channel block length of the channel as it is as the time length of the block of each channel.
[0034]
Claim 3 The stereo audio signal high-efficiency encoding device according to claim 1, 2 Similar to the change in the signal level of each channel when the difference in the signal level and the time difference when the signal level of those segments change most rapidly among the listed channels are within the specified range It is set as the structure judged to have property.
[0036]
According to these configurations, considering the similarity of the signal level changes of the two channels, only when the joint stereo coding can be expected to operate efficiently due to the similarity of the signal level changes, The block length of each channel is determined based on the single channel block length of the two channels.
[0037]
The stereo audio signal high-efficiency encoding apparatus according to claim 4, wherein the stereo audio signal is used as an input signal, the audio signal of each channel is divided into blocks, and high-efficiency encoding is performed by joint stereo encoding based on the block length. A stereo audio signal high-efficiency encoding device for converting the audio signal of each channel into segments for each predetermined time width, calculating a signal level of the segment, and a change in the signal level of the segment To each channel Independently A single channel block length which is a time length of a block corresponding to a case where the joint stereo coding is not applied based on means for detecting a sudden rise or fall of an audio signal and the amount of change in the signal level of the segment. , For each channel audio signal Independently A means for calculating, a means for comparing the signal levels of the segments between the respective channels to determine similarity of the signal levels, a similarity between the signal levels of the respective channels, and the said of each channel; If the single channel block lengths are different, the time length of each channel block is determined to be the shorter time length of the single channel block length, otherwise the single channel block length of each channel is determined. And a means for determining the block length as it is as the time length of the block of each channel.
[0038]
Claim 5 The stereo audio signal high-efficiency encoding device according to claim 1, 4 The similarity of the signal level between the respective channels described is determined using a ratio between the sum of the signal levels of the corresponding segments in each channel and the difference in the signal levels.
[0040]
According to these configurations, considering the similarity of the signal levels of the two channels, the similarity of the two channels can be used only when joint stereo coding can be expected to operate efficiently due to the similarity of the signal levels. The block length of each channel is determined based on the single channel block length.
[0044]
Hereinafter, a stereo audio signal high-efficiency encoding apparatus showing an embodiment of the present invention will be specifically described with reference to the drawings.
FIG. 1 is a block diagram showing the overall configuration of the stereo audio signal high-efficiency encoding apparatus of the present embodiment. In FIG. 1, 100 and 101 are frequency conversion units, 110 is an integrated block length determination unit, 120 is a joint stereo signal generation unit, and 130 is a quantization and coding unit.
[0045]
The operation of the stereo audio signal high-efficiency encoding apparatus configured as described above will be described below.
The input left-channel (Lch) time-axis audio signal is divided into blocks of the length determined by the integrated block length determination unit 110 in the frequency conversion unit 100, and orthogonally transformed into frequency-axis spectral coefficients. Converted. Similarly, the input right-channel (Rch) time axis audio signal is divided into blocks of the length determined by the integrated block length determination unit 110 in the frequency conversion unit 101, orthogonally converted, and the frequency axis Converted to spectral coefficients. In this embodiment, MDCT is used as orthogonal transform.
[0046]
The integrated block length determination unit 110 determines the time length of the block at the time of frequency conversion of the left channel and the right channel based on the signals of both the left and right channels. In addition, the integrated block length determination unit 110 determines the block length, that is, the orthogonal transform length in consideration of the application of joint stereo coding, and prevents deterioration of sound quality due to a pseudo signal called pre-echo or post-echo.
[0047]
In the case of the hybrid coding scheme, the input audio signal is divided into subband signals by a filter bank (not shown), and each subband signal has a length determined by the integrated block length determination unit 110. Divided into blocks, orthogonally transformed and converted into spectral coefficients.
[0048]
The joint stereo signal generation unit 120 receives the block lengths and spectral coefficients of the left and right channels as input, and generates a joint stereo signal necessary for joint stereo encoding in the quantization and encoding unit 130. As this joint stereo coding, in this embodiment, mid / side stereo coding and intensity stereo coding are used.
[0049]
That is, when mid / side stereo coding is applied in the joint stereo signal generation unit 120, a sum signal and a difference signal of the frequency spectra of the left channel and the right channel are generated, and intensity stereo coding is applied. In this case, only the sum signal of the frequency spectrum of the left channel and the right channel is generated.
[0050]
The quantization and encoding unit 130 calculates the masking level of the spectrum coefficient, that is, the allowable quantization noise level based on the auditory model for the left and right stereo signal or the joint stereo signal from the joint stereo signal generation unit 120. Then, based on the calculated allowable quantization noise level, the spectrum coefficient is quantized, the encoding process such as Huffman encoding is performed, and the highly efficient encoded data is output.
[0051]
As described above, the stereo audio signal high-efficiency encoding apparatus according to the embodiment using the integrated block length determining unit, which is a feature of the present invention, will be described in detail below by giving various configuration examples of the integrated block length determining unit 110. Explained.
(Embodiment 1)
FIG. 2 is a block diagram showing a configuration of an integrated block length determination unit in the stereo audio signal high efficiency coding apparatus according to the first embodiment. 2, 200 and 201 are segment signal level calculators for the left and right channels, 210 and 211 are segment signal level memories for the left and right channels, and 220 and 221 are left and right channels, respectively. Signal level change detectors 230 and 231 are block length determiners for the left and right channels, respectively, and 240 is a block length integrated determiner.
[0052]
The operation of the integrated block length determination unit in the stereo audio signal high-efficiency encoding apparatus according to Embodiment 1 configured as described above will be described below.
The audio signal input to the left channel is divided into segments having a time equal to or shorter than the time of the shortest block by the segment signal level calculator 200, and the signal level of each segment is the square value of the signal in the segment. Calculated as a sum.
[0053]
FIG. 3 shows the relationship between blocks and segments in the present embodiment. In the figure, a long block, a short block, and a segment are shown in order from the top, and a broken line indicates a relative time relationship. In this embodiment, since MDCT is used as orthogonal transform, each block overlaps by 50%.
[0054]
In general, a long block is composed of 512 to 2048 samples and a short block is composed of 64 to 256 samples. In this embodiment, a long block is 1024 samples, a short block is 256 samples, and a segment is 128 samples. Consists of samples. That is, the short block length (TS) is 1/4 time of the long block length (TL), and the segment length (T) is 1/2 time of the short block length (TS).
[0055]
Note that the signal level of the segment calculated by the segment signal level calculator 200 is a signal level instead of the sum of the square values of the signals in the segment in order to reduce the dynamic range or reduce the processing amount. The sum of absolute values or the maximum of absolute values of signals may be used. In the case of the hybrid coding scheme, a subband signal that is an output of a subband filter (bandpass filter) is input to the segment signal level calculator 200.
[0056]
Next, the segment signal level memory 210 stores the signal level of each segment calculated by the segment signal level calculator 200 in the memory. Here, the signal level of segment i is S (i).
[0057]
The signal level change detector 220 uses the signal level read from the segment signal level memory 210 to detect the rising edge of the signal, and uses the signal level S (i−1) of the segment immediately before the signal level S (i) of the segment i. ) Ratio, that is, S (i) / S (i-1), and outputs the value.
[0058]
The signal level detector 220 uses the signal level read from the segment signal level memory 210 to obtain the sum T (4 × i) of the signal levels of the four segments in order to detect the falling of the signal. The ratio of the sum of signal levels for every four adjacent segments, that is, the value of T (4 × i) / T (4 × i + 4) is output. Thus, when the signal falls, the sum of the signal levels of the four segments is used because, as described above, the auditory forward masking has a longer time to act than the backward masking. based on.
[0059]
Next, the block length decision unit 230 outputs a signal representing a short block length when the value of S (i) / S (i-1) from the signal level change detector 220 exceeds the threshold value, and is long otherwise. A signal representing the block length is output.
[0060]
As described above, when a sudden increase in signal is detected, a pre-echo can be suppressed by outputting a signal representing a short block length. Further, the block length determination unit 230 outputs a signal representing a short block length even when the value of T (4 × i) / T (4 × i + 4) exceeds the threshold value. Further, when a sudden fall of the signal is detected, a post echo can be suppressed by outputting a signal representing a short block length.
[0061]
As described above, the block length determination unit 230 determines a single channel block length that is a time length of a block corresponding to encoding without applying joint stereo encoding based on the audio signal input to the left channel. Calculate and output in correspondence with the audio signal of the left channel. Similarly, the block length determiner 231 determines a single channel block length, which is a time length of a block corresponding to encoding without applying joint stereo encoding, based on the audio signal input to the right channel. Calculate and output corresponding to the audio signal of the right channel.
[0062]
The block length integrated decision unit 240 determines the right channel and the left channel based on the single channel block length of the left channel from the block length decision unit 230 and the single channel block length of the right channel from the block length decision unit 231. Each block length is determined.
[0063]
When determining the block lengths of the left and right channels, the block length integrated decision unit 240 determines the block lengths of both channels when the two single channel block lengths are different based on the two single channel block lengths. The shorter block length of the two single channel block lengths is determined and output. When the two single channel block lengths are the same, the single channel block length of each channel is determined as it is as the block length of each channel (same for each channel) and output.
[0064]
As described above, by setting both channels to block lengths on the short side, pre-echo and post-echo can be suppressed, and sound quality can be improved by applying joint stereo coding with high coding efficiency.
[0065]
For the purpose of improving the detection accuracy of the signal level change, instead of directly inputting an audio signal as an input to the segment signal level calculator 200 and the segment signal level calculator 201, a high pass filter (HPF) or a band pass is used. An audio signal that has passed through a filter (BPF) may be used.
[0066]
As described above, in the present embodiment, by providing the block length integrated determination unit 240, when the single channel block lengths of two channels are different, the block lengths of both channels are unified with a short block length, It is easy to apply joint stereo coding, and improvement of coding efficiency and sound quality by joint stereo coding can be realized.
(Embodiment 2)
FIG. 4 is a block diagram showing a configuration of an integrated block length determination unit in the stereo audio signal high efficiency coding apparatus according to the second embodiment. 4, 400 and 401 are segment signal level calculators for the left and right channels, 410 and 411 are segment signal level memories for the left and right channels, and 420 and 421 are left and right channels, respectively. The signal level change detectors 430 and 431 are block length determiners for the left and right channels, 440 is a block length integrated determiner, and 450 is a signal level change similarity determiner.
[0067]
The difference in configuration between the first embodiment shown in FIG. 2 and the second embodiment shown in FIG. 4 is that a signal level change similarity determination unit 450 is added in the second embodiment. The operation of the integrated block length determination unit in the stereo audio signal high-efficiency encoding apparatus according to Embodiment 2 configured as described above will be described below. The operations of the segment signal level calculators 400 and 401, the segment signal level memories 410 and 411, the signal level change detectors 420 and 421, and the block length determiners 430 and 431 are the same as the corresponding blocks in the first embodiment. Therefore, explanation here is omitted.
[0068]
The signal level change detectors 420 and 421 are configured to output values indicating the degree of increase and decrease in the signal level of each of the left and right channels. The value indicating the degree of change when the left channel signal level rises and falls, which is output from the left channel signal level change detector 420, is the maximum value that must be taken into account when determining the block length. In addition to obtaining the time for giving the maximum value, for the right channel, similarly, the maximum value for the value indicating the degree of change when the signal level of the right channel output from the signal level change detector 421 increases and decreases. And the time to give the maximum value.
[0069]
When the maximum value of the rise and fall of both channels and the time for giving the maximum value are within a predetermined range, it is determined that the signals of both channels are similar. It is determined that the signals of both channels are not similar, and a signal indicating the result is output.
[0070]
The left channel block length determiner 430 outputs a single channel block length of the left channel corresponding to the case of encoding without applying joint stereo encoding. Similarly, the right channel block length determiner 431 includes: It is configured to output a single channel block length of the right channel corresponding to the case of encoding without applying joint stereo encoding. In the block length integrated determination unit 440, the left channel from the block length determination unit 430 is output. Based on the single channel block length and the right channel single channel block length from the block length determiner 431 and the similarity determination result from the signal level change similarity determiner 450, in each of the right channel and the left channel, Determine the block time length and show the result for each channel block. And outputs a signal indicating the click length.
[0071]
In the block length integrated decision unit 440, when the signal level change similarity decision unit 450 determines that there is similarity and the two single channel block lengths are different, the block lengths of both channels are set to the two single units. The time length of the shorter block of one channel block length is determined, and as a result, a signal is output. In cases other than the above, the single channel block length of each channel is determined as it is as the time length of the block of each channel, and the result is output.
[0072]
If there are similarities in the signal level changes of the two channels as described above and the single channel block lengths of the two channels are different, set the blocks of both channels to the shorter time length. Thus, pre-echo and post-echo can be suppressed, and sound quality can be improved by applying joint stereo coding with high coding efficiency.
[0073]
In addition, when there is no such similarity, since it is not expected to improve the coding efficiency by joint stereo coding, coding is performed with a block length suitable for each channel signal.
[0074]
As described above, in the present embodiment, by providing the block length integrated determination unit 440 and the signal level change similarity determination unit 450, there are similarities in changes in the signal levels of the two channels, and their singles. When the channel block lengths are different, it is easy to apply joint stereo coding by unifying the blocks of both channels to a short time length, and the sound quality can be improved by improving the coding efficiency by joint stereo coding. .
(Embodiment 3)
FIG. 5 is a block diagram showing a configuration of an integrated block length determination unit in the stereo audio signal high efficiency coding apparatus according to the third embodiment. In FIG. 5, 500 and 501 are segment signal level calculators for the left and right channels, 510 and 511 are segment signal level memories for the left and right channels, and 520 and 521 are the left and right channels, respectively. Signal level change detectors 530 and 531 are block length determiners for the left and right channels, 540 is an integrated block length determiner, and 550 is a signal level similarity determiner.
[0075]
A difference in configuration between the first embodiment shown in FIG. 2 and the third embodiment shown in FIG. 5 is that a signal level similarity determination unit 550 is added in the third embodiment.
The operation of the integrated block length determination unit in the stereo audio signal high-efficiency encoding apparatus according to Embodiment 3 configured as described above will be described below. The operations of the segment signal level calculators 500 and 501, the segment signal level memories 510 and 511, the signal level change detectors 520 and 521, and the block length determiners 530 and 531 are the same as the corresponding blocks in the first embodiment. Therefore, explanation here is omitted.
[0076]
The signal level similarity determination unit 550 reads out the signal levels of the respective channels from the segment signal level memory 510 for the left channel and the segment signal level memory 511 for the right channel and compares them, thereby comparing both channels. The similarity of the signal level is determined, and as a result, a signal is output.
[0077]
That is, in the signal level similarity determination unit 550, when the ratio of the absolute value of the difference between the signal levels to the sum of the signal levels of the same segment corresponding to the time of the two channels is equal to or less than a predetermined threshold, the two channels It is determined that the signal levels of the two channels are similar. Otherwise, it is determined that the signals of the two channels are not similar, and as a result, a signal is output.
[0078]
The left channel block length determiner 530 outputs the single channel block length of the left channel corresponding to the case of encoding without applying joint stereo encoding. Similarly, the right channel block length determiner 531 It is configured to output a single channel block length of the right channel corresponding to the case of encoding without applying joint stereo encoding. In the block length integrated determination unit 540, the left channel from the block length determination unit 530 is output. Based on the single channel block length and the right channel single channel block length from the block length determiner 531 and the similarity determination result from the signal level similarity determiner 550, each of the right channel and the left channel is blocked. A signal indicating the block length of each channel indicating the result. To output.
[0079]
In the block length integrated determination unit 540, when it is determined by the signal level similarity determination unit 550 that there is similarity and the two single channel block lengths are different, the block lengths of both channels are set to the two single channels. The time length of the shorter block among the channel block lengths is determined, and as a result, a signal is output. In cases other than the above, the single channel block length of each channel is determined as it is as the time length of the block of each channel, and the result is output.
[0080]
As described above, when the signal levels of the two channels are similar and the single channel block lengths of the two channels are different, the block length of both channels is set to the shorter time length. The sound quality can be improved by suppressing the pre-echo and post-echo and applying joint stereo coding with high coding efficiency.
[0081]
In addition, when there is no such similarity, since it is not expected to improve the coding efficiency by joint stereo coding, coding is performed with a block length suitable for each channel signal.
[0082]
As described above, in the present embodiment, by providing the block length integrated determiner 540 and the signal level similarity determiner 550, the signal levels of the two channels are similar, and their single channel block lengths. When they are different, it is easy to apply joint stereo coding by unifying blocks of both channels with a short time length, and sound quality can be improved by improving coding efficiency by joint stereo coding.
[0083]
【The invention's effect】
As described above, according to the present invention, based on the single channel block length of two channels, the time length of the block of each channel can be determined by integrating both.
[0084]
In addition, considering the similarity of the signal level changes of the two channels, the similarity of the signal levels of the two channels can be used only when joint stereo coding can be expected to operate efficiently due to the similarity of the signal level changes. Based on the single channel block length, the block length of each channel can be determined.
[0085]
In addition, considering the similarity of the signal levels of the two channels, the single channel block of the two channels can be used only when joint stereo coding can be expected to operate efficiently due to the similarity of the signal levels. Based on the length, the time length of the block of each channel can be determined.
[0086]
As described above, the stereo audio signal can be applied to the joint stereo encoding method better than before, and the application improves the encoding efficiency for the stereo audio signal, and the encoded signal is converted into this encoded signal. The sound quality of the stereo audio signal obtained based on this can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an overall configuration of a stereo audio signal high-efficiency encoding apparatus according to an embodiment of the present invention.
FIG. 2 is a block diagram showing a configuration of an integrated block length determination unit in the stereo audio signal high-efficiency encoding apparatus according to Embodiment 1 of the present invention;
FIG. 3 is an explanatory diagram of a relationship between a block length and a segment in the embodiment of the present invention.
FIG. 4 is a block diagram showing a configuration of an integrated block length determination unit in the stereo audio signal high-efficiency encoding apparatus according to Embodiment 2 of the present invention.
FIG. 5 is a block diagram showing a configuration of an integrated block length determination unit in the stereo audio signal high-efficiency encoding apparatus according to Embodiment 3 of the present invention;
FIG. 6 is a block diagram showing the overall configuration of a conventional stereo audio signal high efficiency encoding device.
FIG. 7 is an explanatory diagram of the relationship between the block length and pre-echo for explaining the operation in the conventional example.
FIG. 8 is a block diagram showing a configuration of a block length determining unit for left and right channels in the conventional example
FIG. 9 is an explanatory diagram of a mid / side stereo encoding method for explaining the operation in the conventional example.
[Explanation of symbols]
100, 101 Frequency converter
110 Integrated block length determination unit
120 joint stereo signal generator
130 Quantization and Coding Unit
200, 201, 400, 401, 500, 501 Segment signal level calculator
210, 211, 410, 411, 510, 511 Segment signal level memory
220,221,420,421,520,521 Signal level change detector
230, 231, 430, 431, 530, 531 Block length determiner
240, 440, 540 Block length integrated decision unit
450 Signal level change similarity determiner
550 Signal level similarity determiner

Claims

A stereo audio signal high-efficiency encoding device that uses a stereo audio signal as an input signal, divides the audio signal of each channel into blocks, and performs high-efficiency encoding by joint stereo encoding based on the block length, A means for dividing the audio signal of the channel into segments for each predetermined time width and calculating the signal level of the segment, and a sudden rise or fall of the audio signal independently for each channel from the change in the signal level of the segment And a single channel block length, which is a time length of a block corresponding to a case where the joint stereo coding is not applied, based on the amount of change in the signal level of the segment, in the audio signal of each channel means for calculating independently for, for each channel If the single channel block length is different, the time length of each channel block is determined to be the shorter time length of the single channel block length, and otherwise, the single channel block length is determined to be the single channel block length. A stereo audio signal high-efficiency encoding apparatus comprising: means for determining a channel block length as it is as a time length of a block of each channel.

A stereo audio signal high-efficiency encoding device that uses a stereo audio signal as an input signal, divides the audio signal of each channel into blocks, and performs high-efficiency encoding by joint stereo encoding based on the block length, A means for dividing the audio signal of the channel into segments for each predetermined time width and calculating the signal level of the segment, and a sudden rise or fall of the audio signal independently for each channel from the change in the signal level of the segment And a single channel block length, which is a time length of a block corresponding to a case where the joint stereo coding is not applied, based on the amount of change in the signal level of the segment, in the audio signal of each channel means for calculating independently for said each channel Means for comparing the signal level changes of the segments to determine the similarity of the signal level changes, and the signal level changes of each channel are similar and the single of each channel If the channel block lengths are different, the time length of each channel block is determined to be the shorter time length of the single channel block length. Otherwise, the single channel block length of each channel is determined. A stereo audio signal high-efficiency encoding apparatus, comprising: means for determining the time length of a block of each channel as it is.

If the difference between the signal level change and time when the signal level of the segment changes most rapidly between the channels is within a predetermined range, the change in the signal level of each channel is similar. 3. The stereo audio signal high-efficiency encoding apparatus according to claim 2, wherein it is determined that the stereo audio signal is present.

A stereo audio signal high-efficiency encoding device that uses a stereo audio signal as an input signal, divides the audio signal of each channel into blocks, and performs high-efficiency encoding by joint stereo encoding based on the block length, A means for dividing the audio signal of the channel into segments for each predetermined time width and calculating the signal level of the segment, and a sudden rise or fall of the audio signal independently for each channel from the change in the signal level of the segment And a single channel block length, which is a time length of a block corresponding to a case where the joint stereo coding is not applied, based on the amount of change in the signal level of the segment, in the audio signal of each channel means for calculating independently for said each channel Means for comparing the signal levels of the segments to determine the similarity of the signal levels, and the signal levels of each channel are similar and the single channel block length of each channel is different In this case, the time length of each channel block is determined to be the shorter time length of the single channel block length. Otherwise, the single channel block length of each channel is directly set to each channel. A stereo audio signal high-efficiency encoding device comprising: means for determining a block time length.

5. The stereo audio signal according to claim 4, wherein the similarity of the signal level between the channels is determined by using a ratio of the sum of the signal levels of the corresponding segments in each channel and the difference between the signal levels. High-efficiency encoding device.