WO2024142184A1

WO2024142184A1 - Sound generation program

Info

Publication number: WO2024142184A1
Application number: PCT/JP2022/048049
Authority: WO
Inventors: 梓允安井; 裕樹高田; 竜三高田
Original assignee: Patlite Corp
Current assignee: Patlite Corp
Priority date: 2022-12-26
Filing date: 2022-12-26
Publication date: 2024-07-04
Anticipated expiration: 2025-06-26

Abstract

A sound generation program according to the present invention causes a computer connectable to at least one notification device to execute: a step of acquiring text data and first information relating to at least one first sound; a step of acquiring sound data based on the text data; and a step of combining first sound data relating to the first sound with the start and/or end of the sound data to generate synthesized sound data.

Description

Speech Generation Program

　本発明は、音声生成プログラム、音声生成方法、音声生成装置、及び報知装置に関する。 The present invention relates to a voice generation program, a voice generation method, a voice generation device, and an alarm device.

　従来より、生産機械などの設備に異常が発生したとき、その異常を音声により報知する報知装置が提案されている（例えば、特許文献１）。このような報知装置では、制御回路への入力信号に応じて、予めＲＯＭに登録されている音声データを読み出し、これをスピーカから出力するように構成されている。　In the past, there have been proposals for alarm devices that notify the user of abnormalities by voice when they occur in production machinery or other equipment (for example, see Patent Document 1). Such alarm devices are configured to read voice data that has been preregistered in ROM in response to an input signal to a control circuit, and output this from a speaker.

実開平７－２５４９３号公報Japanese Utility Model Application Publication No. 7-25493

　しかしながら、特許文献１の報知装置では、予め音声データが登録されているため、ユーザが音声データを変更することができないという問題がある。したがって、ユーザの要望に対応する種々の報知音を生成することが要望されていた。 However, the problem with the notification device in Patent Document 1 is that the voice data is registered in advance and the user cannot change the voice data. Therefore, there is a demand for a device that can generate various notification sounds in response to user needs.

　本発明は、上記問題を解決するためになされたものであり、種々の報知音を生成することができる、音声生成プログラム、音声生成方法、音声生成装置、及び報知装置を提供することを目的とする。 The present invention has been made to solve the above problems, and aims to provide a voice generation program, a voice generation method, a voice generation device, and a notification device that can generate various notification sounds.

　本発明に係る第１の音声生成プログラムは、少なくとも１つの報知機器に接続可能なコンピュータに、テキストデータ、及び少なくとも１つの第１音に関する第１情報を取得するステップと、前記テキストデータに基づく音声データを取得するステップと、前記音声データの前後の少なくとも一方に、前記第１音に関する第１音データを結合し、合成音声データを生成するステップと、を実行させる。 The first voice generation program of the present invention causes a computer connectable to at least one alarm device to execute the steps of acquiring text data and first information relating to at least one first sound, acquiring voice data based on the text data, and combining first sound data relating to the first sound with at least one of the front and rear ends of the voice data to generate synthetic voice data.

　本発明に係る第２の音声生成プログラムは、少なくとも１つの報知機器に接続可能なコンピュータに、テキストデータを取得するステップと、前記テキストデータに基づく音声データを取得するステップと、前記音声データに結合する第１音データを取得するステップと、前記音声データの前後の少なくとも一方に、前記第１音データを結合し、合成音声データを生成するステップと、を実行させる。 The second voice generation program of the present invention causes a computer connectable to at least one notification device to execute the steps of acquiring text data, acquiring voice data based on the text data, acquiring first sound data to be combined with the voice data, and combining the first sound data with at least one of the front and rear parts of the voice data to generate synthetic voice data.

　上記音声生成プログラムにおいては、前記合成音声データに基づく合成音声を前記報知機器で再生させるステップを、さらに実行させることができる。 The above voice generation program can further execute a step of playing a synthetic voice based on the synthetic voice data on the notification device.

　上記音声生成プログラムにおいては、第２音に関する第２情報を取得するステップと、前記第２音を、規定の順序にしたがって、前記合成音声の前または後に前記報知機器で再生させるステップと、をさらに実行させることができる。 The above voice generation program can further execute a step of acquiring second information about a second sound, and a step of playing the second sound on the notification device before or after the synthetic voice in a specified order.

　上記音声生成プログラムにおいては、前記第２音または前記合成音声の一方の再生が終了した後、他方を再生するように構成することができる。 The above voice generation program can be configured to play either the second sound or the synthetic voice after the other has finished playing.

　上記音声生成プログラムにおいては、前記第２音または前記合成音声の一方の再生中に、他方の再生が可能な状態になれば、前記一方の再生を中断し、前記他方を再生するように構成することができる。 The above voice generation program can be configured so that, if playback of either the second sound or the synthetic voice becomes possible while the other is being played, playback of the one sound is interrupted and the other sound is played.

　上記音声生成プログラムにおいては、前記第１情報は、予め記憶された複数の前記第１音データのいずれかを指定する情報とすることができる。 In the above voice generation program, the first information can be information that specifies one of a plurality of the first sound data stored in advance.

　本発明に係る音声生成方法は、少なくとも１つの報知機器用の音声生成方法であって、テキストデータ、及び少なくとも１つの第１音に関する第１情報を取得するステップと、前記テキストデータに基づく音声データを取得するステップと、前記音声データの前後の少なくとも一方に、前記第１音に関する第１音データを結合し、合成音声データを生成するステップと、を備えている。 The voice generation method according to the present invention is a voice generation method for at least one alarm device, and includes the steps of acquiring text data and first information relating to at least one first sound, acquiring voice data based on the text data, and combining first sound data relating to the first sound with at least one of the front and rear of the voice data to generate synthetic voice data.

　本発明に係る音声生成装置は、少なくとも１つの報知機器に接続可能な音声生成装置であって、テキストデータ、及び少なくとも１つの第１音に関する第１情報を取得する第１取得部と、前記テキストデータに基づく音声データを取得する音声データ取得部と、前記音声データの前後の少なくとも一方に、前記第１音に関する第１音データを結合し、合成音声データを生成する合成部と、を備えている。 The voice generating device according to the present invention is a voice generating device that can be connected to at least one alarm device, and includes a first acquisition unit that acquires text data and first information related to at least one first sound, a voice data acquisition unit that acquires voice data based on the text data, and a synthesis unit that combines first sound data related to the first sound with at least one of the front and rear ends of the voice data to generate synthetic voice data.

　上記音声生成装置においては、前記合成音声データに基づく合成音声を前記報知機器で再生させる再生指令部をさらに備えることができる。 The above voice generating device may further include a playback command unit that causes the alarm device to play back a synthetic voice based on the synthetic voice data.

　本発明に係る報知装置は、上述した音声生成装置と、少なくとも１つの前記報知機器と、を備えている。 The notification device according to the present invention includes the above-mentioned voice generating device and at least one of the notification devices.

　本発明によれば、種々の報知音を生成することができる。 According to the present invention, various alarm sounds can be generated.

本発明の一実施形態に係る報知装置を含むシステム構成例である。1 is an example of a system configuration including an alarm device according to an embodiment of the present invention. 報知装置の正面図である。FIG. 報知装置の背面図である。FIG. 報知装置のハードウエア構成を示すブロック図である。FIG. 2 is a block diagram showing a hardware configuration of the notification device. 報知装置のソフトウエア構成を示すブロック図である。FIG. 2 is a block diagram showing the software configuration of the notification device. 合成音声データを示す図である。FIG. 11 is a diagram showing synthetic voice data. 合成音声データの再生処理を示すタイミングチャートである。10 is a timing chart showing a reproduction process of synthetic voice data. 合成音声データ及び第２音データの再生処理を示すタイミングチャートである。10 is a timing chart showing a reproduction process of synthetic voice data and second sound data. 合成音声データ及び第２音データの再生処理を示すタイミングチャートの他の例である。13 is another example of a timing chart showing the reproduction process of the synthetic voice data and the second sound data.

　以下、本発明に係る音声生成装置の一実施形態について、図面を参照しつつ説明する。本実施形態では一実施形態として、音声生成装置を組み込んだ報知装置について説明する。図１は、報知装置を含むシステムの構成例であり、少なくとも１つの報知装置１と、報知装置１に接続される設備３とを含み、設備３から送信される音声制御信号により、各種の音声が出力される。報知装置１には、ネットワーク７を介して公知のＴＴＳ(Text-to-Speech)サービス４が接続されている。以下、各構成について、詳細に説明する。 Below, an embodiment of a voice generating device according to the present invention will be described with reference to the drawings. In this embodiment, an alarm device incorporating a voice generating device will be described as one embodiment. Figure 1 shows an example of the configuration of a system including an alarm device, which includes at least one alarm device 1 and equipment 3 connected to the alarm device 1, and various sounds are output in response to voice control signals transmitted from the equipment 3. A well-known TTS (Text-to-Speech) service 4 is connected to the alarm device 1 via a network 7. Each component will be described in detail below.

　＜１．報知装置のハードウエア構成＞ <1. Hardware configuration of the alarm device>

　図２は報知装置の正面図、図３は報知装置の背面図、図４は報知装置のハードウエア構成を示すブロック図である。図１及び図２に示すように、この報知装置１は、基台１０１と、この基台１０１の上部に設けられたボディ１０２と、を有している。ボディ１０２には、基台１０１から延びるＬＥＤユニット（報知機器）１０３が設けられている。基台１０１の外面には、スピーカ（報知機器）１０５、各種操作ボタン１０６が設けられている。また、図４に示すように、基台１０１の内部には、制御部１１、記憶部１２、通信インタフェース１３、及び外部インタフェース１４、が内蔵されている。なお、図４では、通信インタフェース１３を「通信Ｉ／Ｆ」と、外部インタフェース１４を「外部Ｉ／Ｆ」と記載している。 2 is a front view of the alarm device, FIG. 3 is a rear view of the alarm device, and FIG. 4 is a block diagram showing the hardware configuration of the alarm device. As shown in FIGS. 1 and 2, the alarm device 1 has a base 101 and a body 102 provided on the top of the base 101. The body 102 is provided with an LED unit (alarm device) 103 extending from the base 101. A speaker (alarm device) 105 and various operation buttons 106 are provided on the outer surface of the base 101. As shown in FIG. 4, the base 101 contains a control unit 11, a memory unit 12, a communication interface 13, and an external interface 14. In FIG. 4, the communication interface 13 is referred to as the "communication I/F" and the external interface 14 is referred to as the "external I/F".

　基台１０１は平面視矩形状に形成されており、制御部１１等を収容する筐体として機能する。ＬＥＤユニット１０３は、円柱状に形成されており、上部に３つの発光部、つまり上から下に並ぶ第１、第２，及び第３発光部１０３ａ～１０３ｃが設けられている。各発光部１０３ａ～１０３ｃは、円柱状に形成されており、それぞれ、赤、黄、緑の発光を行うようにＬＥＤが内蔵されている。これら発光部１０３ａ～１０３ｃが、上述した設備３の状態に応じて発光する。 The base 101 is formed in a rectangular shape when viewed from above, and functions as a housing that houses the control unit 11, etc. The LED unit 103 is formed in a cylindrical shape, and three light-emitting units are provided on the upper part, that is, first, second, and third light-emitting units 103a to 103c lined up from top to bottom. Each of the light-emitting units 103a to 103c is formed in a cylindrical shape, and has an LED built in to emit red, yellow, and green light, respectively. These light-emitting units 103a to 103c emit light according to the state of the equipment 3 described above.

　図１に示すように、基台１０１の前面の下部には上述したスピーカ１０５が設けられ、後述するように、このスピーカ１０５から設備３の状態が表す音声が出力される。図示を省略するが、基台１０１には音声のボリュームを調整するボリュームスイッチも設けられている。また、基台１０１の前面において、スピーカ１０５の上方には、各種操作ボタン１０６が設けられている。 As shown in FIG. 1, the above-mentioned speaker 105 is provided at the bottom of the front surface of the base 101, and as described below, sound indicating the state of the equipment 3 is output from this speaker 105. Although not shown in the figure, the base 101 is also provided with a volume switch for adjusting the volume of the sound. In addition, various operation buttons 106 are provided above the speaker 105 on the front surface of the base 101.

　図３に示すように、基台１０１の背面には、上述した通信インタフェース１３及び外部インタフェース１４用のコネクタが設けられている。 As shown in FIG. 3, the back surface of the base 101 is provided with connectors for the communication interface 13 and external interface 14 described above.

　図４に示すように、制御部１１は、ＣＰＵ、ＲＡＭ、ＲＯＭ等を含み、プログラム及びデータに基づいて各種情報処理を実行するように構成される。記憶部１２は、例えば、ＨＤＤ、ＳＤＤ等の補助記憶装置で構成され、音声を生成するための音声生成プログラム１２１、登録された第１音データ１２２、設備３から送信されたテキストデータ１２３、登録された第２音データ１２４、報知装置１を駆動するための種々のデータ１２５を記憶する。音声生成プログラム１２１は、設備３から送信される音声制御信号により、後述する合成音声データを生成し、これを合成音声として再生するためのプログラムである。生成した合成音声データは記憶部１２に記憶してもよい。なお、第１音データ１２２、第２音データ１２４は、例えば、ｍｐ３，ｒａｗ，ｗａｖ等のフォーマットのデータであり、第１音及び第２音をそれぞれ再生するためのデータである。これらの音は、通知音、音声などの各種の音である。また、後述するように、第１音データは、合成音声データに結合するための専用のデータであり、第２音データは、合成用以外に用いられるものである。すなわち、第２音データは、単独で第２音を再生するためのデータである。また、第２音は、第１音以外の音の総称であり、一般的には生じたイベント（例えば、異常の発生など）によって異なる音であるが、すべて同じ音で構成することもできる。 As shown in FIG. 4, the control unit 11 includes a CPU, RAM, ROM, etc., and is configured to execute various information processing based on programs and data. The storage unit 12 is configured, for example, with an auxiliary storage device such as an HDD or SSD, and stores a voice generation program 121 for generating voice, registered first sound data 122, text data 123 transmitted from the equipment 3, registered second sound data 124, and various data 125 for driving the alarm device 1. The voice generation program 121 is a program for generating synthetic voice data (described later) based on a voice control signal transmitted from the equipment 3, and reproducing this as synthetic voice. The generated synthetic voice data may be stored in the storage unit 12. The first sound data 122 and the second sound data 124 are data in formats such as mp3, raw, wav, etc., and are data for reproducing the first sound and the second sound, respectively. These sounds are various sounds such as notification sounds and voices. As described below, the first sound data is dedicated data for combining with synthetic voice data, and the second sound data is used for purposes other than synthesis. In other words, the second sound data is data for playing the second sound alone. The second sound is also a general term for sounds other than the first sound, and generally is a sound that differs depending on the event that occurs (for example, the occurrence of an abnormality), but it can also be composed of all the same sound.

　スピーカ１０５は、公知のスピーカで構成され、合成音声データ、第２音データ１２４を再生することで音を出力する。 The speaker 105 is composed of a known speaker, and outputs sound by playing back the synthetic voice data and the second sound data 124.

　通信インタフェース１３は、例えば、有線ＬＡＮ（Local Area Network）モジュール、無線ＬＡＮモジュール等であり、有線又は無線通信を行うためのインタフェースである。すなわち、通信インタフェース１３は、所定の通信プロトコルにしたがって、有線または無線のネットワークを介して報知装置１との間で各種データの送受信を行う。この通信インタフェース１３には、インターネットなどのネットワークを介して外部のＴＴＳサービス４が接続される。 The communication interface 13 is, for example, a wired LAN (Local Area Network) module, a wireless LAN module, etc., and is an interface for performing wired or wireless communication. In other words, the communication interface 13 transmits and receives various data to and from the notification device 1 via a wired or wireless network in accordance with a predetermined communication protocol. An external TTS service 4 is connected to this communication interface 13 via a network such as the Internet.

　外部インタフェース１４は、外部装置と接続するためのインタフェースであり、接続する外部装置に応じて適宜構成される。本実施形態では、この外部インタフェース１４に設備３が接続される。設備３は、例えば、生産設備、プリンタ、サーバー、ネットワークカメラ、防災システム、防犯システム、入退室システム、環境監視システム、受発注システム等であり、音声によって状態が変化したことの報知、あるいは何らかのイベント（例えば、障害や異常など）の発生などの報知を行う各種の装置とすることができる。そして、これら設備３から音声制御信号が外部インタフェース１４を介して報知装置１に入力されると、制御部１１によって実施される音声生成プログラム１２１により、スピーカ１０５から音声が出力される。 The external interface 14 is an interface for connecting to an external device, and is configured appropriately according to the external device to be connected. In this embodiment, the equipment 3 is connected to this external interface 14. The equipment 3 may be, for example, a production facility, a printer, a server, a network camera, a disaster prevention system, a security system, an entrance/exit system, an environmental monitoring system, an order receiving system, etc., and may be any type of device that uses audio to notify of a change in status or the occurrence of some event (for example, a fault or abnormality). When an audio control signal is input from the equipment 3 to the alarm device 1 via the external interface 14, audio is output from the speaker 105 by the audio generation program 121 executed by the control unit 11.

　なお、各設備３から送信される音声制御信号は、種々の態様で送信することができる。例えば、ＨＴＴＰＳ／ＨＴＴＰ通信コマンド、クラウド通信、ＳＳＨ／ＲＳＨコマンド、ＳＯＣＫＥＴ通信コマンドなどを利用して、音声制御信号を送信することができる。この音声制御信号には、後述する第１情報、第２情報、及びテキストデータが含まれる。 The voice control signal transmitted from each facility 3 can be transmitted in various ways. For example, the voice control signal can be transmitted using HTTPS/HTTP communication commands, cloud communication, SSH/RSH commands, SOCKET communication commands, etc. This voice control signal includes the first information, second information, and text data described below.

　＜２．報知装置のソフトウエア構成＞
　次に、報知装置１のソフトウエア構成について説明する。図５は、報知装置１のソフトウエア構成を示すブロック図である。図５に示すように、音声登録装置２の制御部２１は、記憶部２２に記憶された音声生成プログラム１２１をＲＡＭに展開すると、その音声生成プログラム１２１をＣＰＵにより解釈及び実行し、第１取得部１１１、第２取得部１１２、音声データ取得部１１３、合成部１１４、及び再生指令部１１５を備えたコンピュータとして機能する。 2. Software configuration of the notification device
Next, a software configuration of the notification device 1 will be described. Fig. 5 is a block diagram showing the software configuration of the notification device 1. As shown in Fig. 5, when the control unit 21 of the voice registration device 2 expands the voice generation program 121 stored in the storage unit 22 into the RAM, the control unit 21 interprets and executes the voice generation program 121 by the CPU, and functions as a computer including a first acquisition unit 111, a second acquisition unit 112, a voice data acquisition unit 113, a synthesis unit 114, and a playback command unit 115.

　第１取得部１１１は、設備３から送信される第１情報及びテキストデータを取得する。第１情報は、上述した第１音データ１２２を指定する情報である。すなわち、記憶部１２に記憶された複数の第１音データ１２２のいずれかを指定する情報である。本実施形態において、第１情報は、後述する音声データの前及び後に結合される２つの第１音データ１２２を指定する。この場合、２つの第１音データ１２２は同じであってもよいし、異なっていてもよい。指定された第１音データは、合成部１１４に送信される。 The first acquisition unit 111 acquires the first information and text data transmitted from the equipment 3. The first information is information that specifies the above-mentioned first sound data 122. In other words, it is information that specifies one of the multiple first sound data 122 stored in the storage unit 12. In this embodiment, the first information specifies two first sound data 122 that are to be combined before and after the voice data described below. In this case, the two first sound data 122 may be the same or different. The specified first sound data is transmitted to the synthesis unit 114.

　また、第１取得部１１１は、設備３からテキストデータも取得する。第１情報とテキストデータは１つのデータセットとして第１取得部１１１に送信される。 The first acquisition unit 111 also acquires text data from the equipment 3. The first information and the text data are transmitted to the first acquisition unit 111 as one data set.

　第２取得部１１２は、設備３から送信される第２情報を取得する。第２情報は、上述した第２音データ１２４を指定する情報である。すなわち、記憶部１２に記憶された複数の第２音データ１２４のいずれかを指定する情報である。指定された第２音データは、再生指令部１１５に送信される。 The second acquisition unit 112 acquires the second information transmitted from the equipment 3. The second information is information that specifies the second sound data 124 described above. In other words, it is information that specifies one of the multiple pieces of second sound data 124 stored in the storage unit 12. The specified second sound data is transmitted to the playback command unit 115.

　音声データ取得部１１３は、取得したテキストデータをＴＴＳサービス４に送信し、生成された音声データを取得する。 The voice data acquisition unit 113 transmits the acquired text data to the TTS service 4 and acquires the generated voice data.

　合成部１１４は、図６に示すように、音声データの前後に、第１取得部１１１で指定された第１音データを結合し、合成音声データを生成する。例えば、テキストデータが「異常が発生しました」であれば、これに対応する音声データの前後に、「ピンポンパンポン」などの通知音に関する第１音データを付加した合成音声データが生成される。これにより、「ピンポンパンポン異常が発生しましたピンポンパンポン」といった連続した音を再生するためのデータが生成される。なお、第１取得部１１１、音声データ取得部１１３、及び合成部１１４は、データの取得順に処理を行う。 As shown in FIG. 6, the synthesis unit 114 combines the first sound data specified by the first acquisition unit 111 before and after the voice data to generate synthetic voice data. For example, if the text data is "An abnormality has occurred," synthetic voice data is generated in which first sound data related to a notification sound such as "ping pong pan pong" is added before and after the corresponding voice data. This generates data for playing a continuous sound such as "ping pong pan pong An abnormality has occurred ping pong pan pong." Note that the first acquisition unit 111, the voice data acquisition unit 113, and the synthesis unit 114 perform processing in the order in which the data is acquired.

　再生指令部１１５は、生成された合成音声データ及び第２音データを再生する機能を有する。つまり、再生指令部１１５は、スピーカ１０５に対して合成音声データ及び第２音データの再生指令を行い、これに基づいて、スピーカ１０５は、これらの再生を行い、音声等をスピーカ１０５から出力する。また、再生指令部１１５は、再生キューを有し、この再生キューにしたがって、再生指令を行う。すなわち、再生指令部１１５は、先に送信した合成音声データまたは第２音データのスピーカ１０５での再生が終了したのを待って、次のデータの再生指令をスピーカ１０５に送信する。 The playback command unit 115 has the function of playing back the generated synthetic voice data and second sound data. That is, the playback command unit 115 issues a playback command to the speaker 105 to play back the synthetic voice data and second sound data, and based on this, the speaker 105 plays back these data and outputs sound, etc. from the speaker 105. The playback command unit 115 also has a playback queue, and issues playback commands according to this playback queue. That is, the playback command unit 115 waits until playback of the previously transmitted synthetic voice data or second sound data by the speaker 105 is completed, and then transmits a playback command to the speaker 105 for the next data.

　＜３．音の再生処理＞
　次に、上記のように構成された報知装置１における音の再生処理について、図７及び図８のタイミングチャートを参照しつつ説明する。図７及び図８の横方向（左から右）は処理の流れを示し、縦方向（上から下）は時間を示している。つまり、縦方向は処理の時間を示しており、各処理（各ブロック）の縦方向が長ければ、処理の時間が長いことを示している。 <3. Sound playback processing>
Next, the sound reproduction process in the alarm device 1 configured as described above will be described with reference to the timing charts in Fig. 7 and Fig. 8. The horizontal direction (from left to right) in Fig. 7 and Fig. 8 indicates the flow of the process, and the vertical direction (from top to bottom) indicates time. In other words, the vertical direction indicates the time of the process, and the longer the vertical direction of each process (each block), the longer the process time.

　まず、図７を参照しつつ合成音声データの再生について説明する。音声データ取得部１１３がテキストデータを受信すると、音声データ取得部１１３は、受信したテキストデータをＴＴＳサービス４に送信し、音声データの生成要求を行う。これに対して、ＴＴＳサービス４は、送信されたテキストデータに対応する音声データを生成し、これを音声データ取得部１１３に送信する。 First, the playback of synthetic voice data will be described with reference to FIG. 7. When the voice data acquisition unit 113 receives text data, the voice data acquisition unit 113 transmits the received text data to the TTS service 4 and requests the generation of voice data. In response, the TTS service 4 generates voice data corresponding to the transmitted text data and transmits this to the voice data acquisition unit 113.

　生成された音声データは、合成部１１４に送信され、第１音データとの合成が行われる。こうして合成音声データが生成された後、合成音声データは再生指令部１１５に送信される。再生指令部１１５では、再生キューにしたがって再生待ちが行われるが、先に再生すべきデータがない場合には、再生待ちが行われることなく、スピーカ１０５に再生指令が送信される。そして、スピーカ１０５では合成音声データの再生が行われる。 The generated voice data is sent to the synthesis unit 114, where it is synthesized with the first sound data. After the synthetic voice data is generated in this manner, it is sent to the playback command unit 115. The playback command unit 115 waits for playback according to the playback queue, but if there is no data to be played first, a playback command is sent to the speaker 105 without waiting for playback. The synthetic voice data is then played back by the speaker 105.

　次に、合成音声データ及び第２音データの再生について説明する。一例として、図８に示すように、テキストデータが音声データ取得部１１３に送信された後に、第２取得部１１２が第２音情報を受信した場合について、説明する。上述したように、第２取得部１１２が第２音情報を取得すると、これに対応する第２音データを指定し、これを再生指令部１１５に送信する。第２音データの指定には時間を要しないが音声データの生成には時間を要するため、この例では、テキストデータが音声データ取得部１１３に送信された後に、第２音データが指定されたとしても、再生指令部１１５には、合成音声データよりも先に第２音データが送信される。この例では、再生指令部１１５に送信された第２音データよりも前に、再生すべきデータがないため、第２音データに関する再生指令が即座に実行される。 Next, the playback of the synthetic voice data and the second sound data will be described. As an example, as shown in FIG. 8, a case will be described in which the second acquisition unit 112 receives the second sound information after the text data is transmitted to the voice data acquisition unit 113. As described above, when the second acquisition unit 112 acquires the second sound information, it designates the corresponding second sound data and transmits it to the playback command unit 115. Since it does not take time to designate the second sound data but it takes time to generate the voice data, in this example, even if the second sound data is designated after the text data is transmitted to the voice data acquisition unit 113, the second sound data is transmitted to the playback command unit 115 before the synthetic voice data. In this example, since there is no data to be played before the second sound data transmitted to the playback command unit 115, the playback command for the second sound data is immediately executed.

　第２音データの再生中は、合成音声データが生成されて再生指令が実行できる状態になったとしても、第２音データの再生が完了するまでは、合成音声データの再生待ちが行われる。そして、第２音データの再生が完了すると、再生指令部１１５は、合成音声データの再生指令を行う。 When the second sound data is being played back, even if synthetic voice data is generated and a playback command can be executed, playback of the synthetic voice data is put on hold until playback of the second sound data is complete. Then, when playback of the second sound data is complete, the playback command unit 115 issues a command to play the synthetic voice data.

　その後、他の第２音データが再生指令部１１５に送信されたとき、合成音声データの再生が完了している場合には、その第２音データの再生指令は即座に実行される。なお、この例では、１つの合成音声と２つの第２音を再生する例について説明したが、上述したように、第２音は第１音以外の報知音の総称であるため、２つの第２音は同じでもよいし、異なっていてもよい。 After that, when other second sound data is sent to the playback command unit 115, if playback of the synthetic voice data has been completed, the playback command for that second sound data is immediately executed. Note that in this example, an example of playing one synthetic voice and two second sounds has been described, but as described above, the second sound is a general term for notification sounds other than the first sound, so the two second sounds may be the same or different.

　＜４．特徴＞
　以上のように、本実施形態によれば、次の効果を得ることができる。
（１）予め記憶されたデータではなく、所望のテキストから生成される音声データと第１音データとを組み合わせて自由に報知用のデータを生成することができる。したがって、ユーザの要望に応じて、報知用データの生成の自由度を向上することができる。 <4. Features>
As described above, according to this embodiment, the following effects can be obtained.
(1) The notification data can be freely generated by combining the first sound data and the voice data generated from a desired text, instead of the pre-stored data, so that the degree of freedom in generating the notification data can be improved according to the user's request.

（２）上記のような合成音声を再生するには、例えば、音声データと第１音データとを個別に再生指令部１１５に送信し、再生指令部１１５において、これらを連続的に再生する再生指令を実行することもできるが、例えば、音声データと第１音データとの間に別の音データが再生指令部１１５に送信された場合には、音声データと第１音データとを連続的に再生できない恐れがある。これに対して、本実施形態では、音声データと第１音データとを合成し、合成音声データを生成した上で、再生指令部１１５に送信しているため、音声データと第１音データとが分離して再生されるのを確実に防止することができる。 (2) To play back the above-mentioned synthetic voice, for example, the voice data and the first sound data can be sent separately to the playback command unit 115, and the playback command unit 115 can execute a playback command to play them continuously. However, for example, if other sound data is sent to the playback command unit 115 between the voice data and the first sound data, there is a risk that the voice data and the first sound data cannot be played back continuously. In contrast, in this embodiment, the voice data and the first sound data are synthesized to generate synthetic voice data, which is then sent to the playback command unit 115, so that it is possible to reliably prevent the voice data and the first sound data from being played back separately.

　＜５．変形例＞
　以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、その趣旨を逸脱しない限りにおいて、種々の変更が可能である。なお、以下に説明する変形例は適宜組み合わせ可能である。 5. Modifications
Although the embodiment of the present invention has been described above, the present invention is not limited to the above embodiment, and various modifications are possible without departing from the spirit of the present invention. Note that the modified examples described below can be appropriately combined.

　＜５－１＞
　上記実施形態では、音声データの前後のそれぞれに、１つの第１音データを結合しているが、複数の第１音データを結合することもできる。また、上記実施形態では、音声データの前後に第１音データ１２２を結合しているが、音声データの前後のいずれか一方にのみ第１音データ１２２を結合してもよい。 <5-1>
In the above embodiment, one piece of first sound data is linked to the front and back of the voice data, but multiple pieces of first sound data may be linked. Also, in the above embodiment, the first sound data 122 is linked to the front and back of the voice data, but the first sound data 122 may be linked to only one of the front and back of the voice data.

　＜５－２＞
　上記実施形態では、図８に示すように、第２音データの再生が終了した後に、合成音声データの再生を行っているが、後から再生指令部１１５に送信されたデータを優先して再生することもできる。例えば、図９に示すように、第２音データの再生中であっても、合成音声データが再生指令部１１５に送信され、再生可能な状態になった場合には、第２音データの再生が終了するのを待つことなく、再生指令部１１５が合成音声データの再生指令を実行することができる。 <5-2>
In the above embodiment, as shown in Fig. 8, the synthetic voice data is played back after the playback of the second sound data is completed, but it is also possible to give priority to playback of data transmitted to the playback command unit 115 later. For example, as shown in Fig. 9, even during playback of the second sound data, if synthetic voice data is transmitted to the playback command unit 115 and becomes available for playback, the playback command unit 115 can execute a playback command for the synthetic voice data without waiting for the playback of the second sound data to be completed.

　＜５－３＞
　上記実施形態では、報知装置１の外部のＴＴＳサービス４を用いて、音声データを取得しているが、音声データの取得方法はこれに限定されない。例えば、報知装置１内にＴＴＳ処理を行うためのソフトウエアを内蔵し、これを用いて音声データを生成することもできる。 <5-3>
In the above embodiment, the voice data is acquired using the TTS service 4 external to the notification device 1, but the method of acquiring the voice data is not limited to this. For example, software for performing TTS processing may be built into the notification device 1, and the voice data may be generated using this software.

　＜５－４＞
　上記実施形態では、第１音データ１２２と第２音データ１２４を分けて記憶しているが、１つの音データのデータベースから第１音データと第２音データを取得することもできる。また、上記実施形態では、設備３から第１情報及び第２情報を取得し、これによって報知装置１に記憶されている音データ１２２，１２４を指定しているが、例えば、設備から第１音データ１２２及び第２音データ１２４の少なくとも一方を設備３から直接取得することもできる。あるいは、第１音データ及び第２音データのデータベースを報知装置の外部に設け、第１情報及び第２情報で指定された各音データを報知装置１の外部から取得することもできる。 <5-4>
In the above embodiment, the first sound data 122 and the second sound data 124 are stored separately, but the first sound data and the second sound data can be acquired from a single sound data database. In the above embodiment, the first information and the second information are acquired from the equipment 3, and the sound data 122, 124 stored in the alarm device 1 are specified by this, but, for example, at least one of the first sound data 122 and the second sound data 124 can be acquired directly from the equipment 3. Alternatively, the database of the first sound data and the second sound data can be provided outside the alarm device, and the sound data specified by the first information and the second information can be acquired from outside the alarm device 1.

　＜５－５＞
　上記実施形態では、設備３から送信される音声制御信号に含まれる第１情報（第１音データを指定する情報、または第１音データそのもの）に基づいて、第１音データが設定されるが、所定の基準にしたがって、第１音データの選択を報知装置１で行うこともできる。例えば、報知装置１が特定の時間に設備３から音声制御信号を受信した場合には、その時間に対応した第１音データを制御部１１が選択し、音声データに結合することができる。あるいは特定の設備３から音声制御信号を受信した場合には、その設備３に対応する第１音データを制御部１１が選択し、音声データに結合することもできる。このような基準を予め報知装置１において設定しておけば、ユーザの要望に応じた合成音声を生成することができる。また、第２音についても、例えば、設備３から第２情報を送信することなく、報知を指示する信号のみを送信し、上述したような基準に基づいて、報知装置１において第２音を選択することもできる。 <5-5>
In the above embodiment, the first sound data is set based on the first information (information specifying the first sound data, or the first sound data itself) included in the voice control signal transmitted from the facility 3, but the selection of the first sound data can also be performed by the notification device 1 according to a predetermined criterion. For example, when the notification device 1 receives a voice control signal from the facility 3 at a specific time, the control unit 11 can select the first sound data corresponding to that time and combine it with the voice data. Alternatively, when a voice control signal is received from a specific facility 3, the control unit 11 can select the first sound data corresponding to that facility 3 and combine it with the voice data. If such a criterion is set in advance in the notification device 1, a synthetic voice according to the user's request can be generated. In addition, as for the second sound, for example, the facility 3 can transmit only a signal instructing a notification without transmitting the second information, and the notification device 1 can select the second sound based on the above-mentioned criterion.

　＜５－６＞
　上記実施形態では、合成音声データを生成するために、第１音とテキストデータとを１つのデータセットとして設備３から報知装置１に送信しているが、このデータセットに第１音が含まれていない場合には、合成音声データを生成せず、テキストデータから音声データのみを生成し、これを再生することもできる。また、第１音とテキストデータとを１つのデータセットとして送信するのではなく、これらを関連付けた上で異なるタイミングで送信することもできる。このように異なるタイミングで送信されたとしても、合成処理が完了するまでは、合成音声データは送信指令部１１５に送信されない。 <5-6>
In the above embodiment, in order to generate the synthetic voice data, the first sound and the text data are transmitted as one data set from the facility 3 to the alarm device 1, but if the data set does not include the first sound, it is also possible to generate only voice data from the text data without generating synthetic voice data, and to play this. Also, instead of transmitting the first sound and the text data as one data set, they can be associated with each other and transmitted at different times. Even if transmitted at different times in this way, the synthetic voice data is not transmitted to the transmission command unit 115 until the synthesis process is completed.

　＜５－７＞
　上記実施形態で示した報知装置の態様は一例であり、形状、操作ボタンの配置など種々の変更が可能である。例えば、ＬＥＤユニット１０３を設けず、音のみで報知を行うように構成することもできる。 <5-7>
The aspect of the notification device shown in the above embodiment is merely an example, and various modifications are possible in terms of shape, arrangement of operation buttons, etc. For example, the notification device may be configured to notify by sound alone, without providing the LED unit 103.

　＜５－８＞
　上記実施形態では、第１取得部１１１、第２取得部１１２、音声データ取得部１１３、合成部１１４、及び再生指令部１１５を報知装置１内に設けているが、少なくともこれらの機能構成を有する装置を音声データ生成装置として構成し、これにスピーカ等の音の出力用の報知機器を接続することで、報知装置を構成することもできる。この場合、音声生成装置は、提供されるサービス専用に設計された情報処理装置、汎用のデスクトップＰＣ、タブレットＰＣ等であってもよい。また、この音声生成装置においては、再生指令部１１５を設けず、これを他の機器に設けることもできる。すなわち、再生司令を行わず、合成音声データの生成を主たる機能とすることもできる。 <5-8>
In the above embodiment, the first acquisition unit 111, the second acquisition unit 112, the voice data acquisition unit 113, the synthesis unit 114, and the playback command unit 115 are provided in the notification device 1, but a device having at least these functional configurations may be configured as a voice data generation device, and a notification device for outputting sound such as a speaker may be connected to the device to configure the notification device. In this case, the voice generation device may be an information processing device designed specifically for the service to be provided, a general-purpose desktop PC, a tablet PC, or the like. In addition, the playback command unit 115 may not be provided in this voice generation device, and may be provided in another device. In other words, the main function of the voice generation device may be to generate synthetic voice data without issuing a playback command.

１　報知装置
１１１　第１取得部
１１２　第２取得部
１１３　音声データ生成部
１１４　合成部
１１５　再生指令部
１２１　音声生成プログラム
１２２　第１音データ
１２３　テキストデータ
１２４　第２音データ REFERENCE SIGNS LIST 1 Notification device 111 First acquisition unit 112 Second acquisition unit 113 Voice data generation unit 114 Synthesis unit 115 Playback command unit 121 Voice generation program 122 First voice data 123 Text data 124 Second voice data

Claims

A computer connectable to at least one notification device,
obtaining text data and first information relating to at least one first sound;
obtaining voice data based on the text data;
combining first sound data relating to the first sound with at least one of the front and rear of the voice data to generate synthetic voice data;
A speech generation program that executes the following:

A computer connectable to at least one notification device,
obtaining text data;
obtaining voice data based on the text data;
obtaining first sound data to be combined with the audio data;
combining the first sound data with at least one of the front and rear of the voice data to generate synthetic voice data;
A speech generation program that executes the following:

The voice generation program according to claim 1 or 2, further comprising a step of playing a synthetic voice based on the synthetic voice data on the notification device.

obtaining second information about a second sound;
playing the second sound on the notification device before or after the synthetic speech in a specified order;
4. The voice generating program according to claim 3, further comprising:

The voice generation program according to claim 4, configured to play the second sound or the synthetic voice after playback of either one of them is completed.

The voice generation program according to claim 4, configured to interrupt the playback of one of the second sound or the synthetic voice and play the other if playback of the other becomes possible while the other is being played.

The voice generation program according to claim 1, wherein the first information is information that specifies one of a plurality of the first sound data stored in advance.

1. A method for generating sound for at least one notification device, comprising:
obtaining text data and first information relating to at least one first sound;
obtaining voice data based on the text data;
combining first sound data relating to the first sound with at least one of the front and rear of the voice data to generate synthetic voice data;
A method for generating voice comprising:

A voice generating device connectable to at least one notification device,
a first acquisition unit that acquires text data and first information related to at least one first sound;
a voice data acquisition unit that acquires voice data based on the text data;
a synthesis unit that combines first sound data related to the first sound with at least one of the front and rear of the voice data to generate synthetic voice data;
A voice generating device comprising:

The voice generating device according to claim 9, further comprising a playback command unit that causes the notification device to play back a synthetic voice based on the synthetic voice data.

A voice generating device according to claim 10;
At least one of the notification devices;
The notification device includes: