[go: up one dir, main page]

WO2005101259A1 - Method and system for sending an audio message - Google Patents

Method and system for sending an audio message Download PDF

Info

Publication number
WO2005101259A1
WO2005101259A1 PCT/IB2005/051156 IB2005051156W WO2005101259A1 WO 2005101259 A1 WO2005101259 A1 WO 2005101259A1 IB 2005051156 W IB2005051156 W IB 2005051156W WO 2005101259 A1 WO2005101259 A1 WO 2005101259A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
message
audio message
recipient
control information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/IB2005/051156
Other languages
French (fr)
Inventor
Eric Thelen
Thomas Portele
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Philips Intellectual Property and Standards GmbH
Koninklijke Philips NV
Original Assignee
Philips Intellectual Property and Standards GmbH
Koninklijke Philips Electronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Philips Intellectual Property and Standards GmbH, Koninklijke Philips Electronics NV filed Critical Philips Intellectual Property and Standards GmbH
Priority to JP2007507894A priority Critical patent/JP2007533236A/en
Priority to EP05718667A priority patent/EP1738277A1/en
Publication of WO2005101259A1 publication Critical patent/WO2005101259A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail

Definitions

  • This invention relates to a method for sending an audio message from a sender to a recipient over an audio messaging system and to an appropriate audio messaging system. Further the invention relates to a transmitting device and to a receiving device for such an audio messaging system.
  • SMS Short Messaging Service
  • Text news systems like AOL's Instant Messenger, Microsoft's MSM Messenger and Yahoo's Messenger for PCs can be used free of charge after downloading the required free software.
  • Some of these PC- based messaging providers offer a voice-chat functionality in addition to the text messaging services. Furthermore, some other providers have specialised in voice chat, ultimately leading to a voice-over-IP (internet protocol) scenario.
  • voice chat functionality is the possibility for the user to interact explicitly, for example by choosing a chat window and typing there or by other actions like writing a word document and sending it.
  • voice interaction is continually transmitted, i.e. an uninterrupted interchange takes place. This is often not what the user really wants, for example when he is in a room with other people and only wishes to transmit specific remarks as messages, whereas the remarks directed by him at the other people in the room generally should not be transmitted.
  • Normal telephony allows the user to circumvent this problem by covering the microphone with his hand or switching the telephone to mute.
  • this is not possible when using a hands-free telephone or headset.
  • the recipient of a message has a similar problem — while it is possible to read private messages received using text-based messaging services, even when a third party is in the same room, by reading the messages from a screen or a display which cannot be viewed by the third party, it is next to impossible to ensure that audible messages not be heard by third parties for whom the messages are not intended, unless the messages are listened to through headphones.
  • Text messaging systems do indeed appear to enjoy a greater acceptance level than voice chat functionality. This is probably owing to the tendency that users do not really desire a permanent conversation experience. On the one hand, they want to be able to connect to the other person. On the other hand, they would just as equally like to be connected in an offline mode in which they are not permanently involved in an ongoing conversation in which all their remarks are communicated.
  • an object of the present invention is to provide a method for sending an audio message from a sender to a recipient over an audio messaging system and an appropriate audio messaging system which offers to the user essentially the same experience as text messaging systems.
  • the user should be able to easily send specific utterances as audio messages while excluding other utterances from being sent by the messaging system.
  • the present invention provides a method for sending an audio message from a sender to a recipient over an audio messaging system comprising the following steps: First, a sender's audio message is collected by a transmitting device. The message is usually generated by the sender speaking the message.
  • the sender generate the message or parts of the message in another form, for example by singing, playing on an instrument, clapping hands etc.
  • This audio message will then be analysed to detect a control information part, also called “audio header” in the following, containing directives such as details of communication specifications of the message; and a main part comprising the effective message or effective information which is to be sent to the recipient, also called “audio body” in the following.
  • the terms "sender” and “receiver” do not necessarily imply individual users, but can mean user groups, a member or all members of such a group.
  • a user group might use a single shared transmitting or receiving device, for example members of a family to whom the device belongs, or employees in an office using a device designated for that office.
  • a user group might also mean a group of users each of whom has his own device, in which case a message destined for the user group will be transmitted to all receiving devices.
  • the communication specifications of the message, incorporated in the control information part may be any kind of transmission and /or presentation specification like, for example, a message type and/or a sending mode, e.g. information specifying that the message is secret, private, urgent etc.
  • the control information part could also include information for sender identification or for specifying the recipient of the message. For instance, a typical audio header may be "Private message from Bob to Carl". This control information part of the audio message is at least partially interpreted for controlling the audio messaging system for transmitting and/or presenting the specific audio message.
  • a control signal for the transmitting device and/or the receiving device and/or other parts like transceiving stations, router etc. of the audio messaging system may be generated based on the control information part.
  • at least the main part of the audio message is sent to a receiving device located in the vicinity of the recipient and is presented there to the recipient.
  • An appropriate audio messaging system for sending an audio message from a sender to a recipient according to this method comprises a transmitting device with a user interface for collecting a sender's audio message and message analysing means for analysing the audio message for detecting a control information part concerning communication specifications of the audio message and a main part comprising the actual message which is to be sent to the recipient.
  • the audio messaging system comprises an interpreting unit for at least partially interpreting the control information part of the audio message for controlling the audio messaging system for communicating the specific audio message.
  • the audio messaging system comprises a receiving device with a user interface for presenting at least the main part of the audio message to the recipient.
  • the audio messaging system requires a means for transmitting at least the main part of the audio message from the transmitting device to the receiving device.
  • the system analyses the audio message accordingly and separates the audio header containing the control information from the audio body with any utterances intended for transmission. If the system is unable to detect an audio header with appropriate directions for communicating a message to a particular person in a particular manner, then nothing will be transmitted. This is illustrated in the following simple example: assuming a user of the system says “Message to Carl: the soccer match starts at 7.00pm", this utterance will be picked up by the user interface of the transmitting device and analysed. The audio header "Message to Carl” will be detected and interpreted, and the message "The soccer match starts at 7.00 pm" will be transmitted to a recipient called "Carl".
  • the invention provides an exceptionally simple and user- friendly means of controlling the system, so that only certain utterances are transmitted by the audio messaging system to other persons, without having to first deactivate the system or parts of the system, for example a microphone or loudspeaker.
  • the sending user can control the system with respect to transmitting the message and presenting it, whereby all control directives can be comfortably included in the message by means of appropriate formulation in an audio header, without the user having to carry out any manual actions.
  • the entire control of the audio messaging system can be comfortably carried out using a hands-free set.
  • the control information part of the audio message is also at least partially transmitted to the receiving device and interpreted for controlling the presentation of the audio message to the recipient.
  • the receiving device receives appropriate information, with the aid of the audio header, for example as to when, how and to which user(s) the audio message or the audio body of the audio message is to be output.
  • the audio header can also be output at least partially to the recipient. Since the control information part preferably deals with commands spoken by the user, automatic speech recognition techniques can be used to identify the control information part within the audio message, whereby automatic speech recognition in this case does not imply speech recognition in a strict sense, but rather language understanding techniques.
  • the transmitting device should comprise an automatic speech recognition areangement.
  • the audio message is preferably built up in a defined composite structure in which the control information part is positioned at a specific position respective to the main part. More preferably, the control information part is positioned at the beginning of the audio message and followed by the main part. The advantage of this is that the control information part is the first to be detected by the speech recognition anangement, and the following main part need only be buffered or prepared for transmission.
  • the control information part can, however, be located at any suitable position within the message, for example at the end of the message, or the control information part might be distributed over several positions in the message, so that certain control information is located at the start of the message and further control information is located towards the middle or at the end of the message.
  • Analysis of the audio message with the aid of an automatic speech recogniser might involve, for example, searching for certain key- words that might be stored by the audio messaging system in an appropriate memory such as a storage unit in the transmitting device or receiving device.
  • key-words might be "message", “message to” etc., descriptors for possible recipients of the messages, as well as key- words specifying the type of message or manner of transmission, for example "secret", “private” or "urgent".
  • unique identifier strings are associated with the possible users or user groups of the audio messaging system.
  • Such a unique identifier string might comprise, for example, the user's real name, or might equally well be any other string concealing the identity of the various users.
  • entire user groups can be identified collectively using a single string.
  • the use of nicknames or fantasy names which can most easily be recalled by the other users is prefened. These nicknames are included in the system's vocabulary and can be used to efficiently address a fellow user in the audio header by just saying his nickname.
  • groups can be defined where all connected members will receive the message if the audio header contains the name of the group.
  • the identifier strings of the possible recipients are stored together with a conesponding address book entries in a memory of the transmitting device and, if need be, in the receiving device or in a further suitable location in the audio messaging system.
  • Audio messages will often be sent to a number of people at the same time. During a longer conversation the same list of recipients will be frequently used. When speaking the audio header, it is inconvenient for a user if all names of all recipients have to be spoken each time. Therefore, dynamically associating nicknames or other identifier strings with the list of relevant address book entries will make the sending of messages more comfortable.
  • a key-word like "Reply" or similar is used to indicate in the audio header that the associated audio message should be transmitted to the sender of the last message received and possibly to all users to whom the last message was sent.
  • the transmitting device is preferably realised as a dialog system, comprises such a dialog system, or is part of such a dialog system.
  • an automatic dialog can be initiated between the audio messaging system, or more particularly the transmitting device, and the sender, in order to identify the control information part of the audio message when an ambiguity value (e.g. based on an internal confidence measure) of a recognition result of the automatic speech recogniser reaches or exceeds a certain ambiguity threshold level.
  • the system can issue a prompt to the user asking for confirmation, or can enter into a dialog with the user to allow conection of a supposed audio header. In this way, the system ensures that no message is sent unintentionally, or sent to the wrong recipient.
  • the control information part in a prefened embodiment, is also transmitted at least partially to the receiving device, where it is interpreted to control the output of the audio message. This is particularly useful when information pertaining to identification of the recipient, for example the identifier string, is also transmitted.
  • the user can be identified on the part of the receiving device before output of the audio message of the audio body of the audio message takes place.
  • the identifier string of a user or a user group is linked to identifier characteristics of the specific user, user group, or members of a user group.
  • the identifier characteristics can be, for example, a secret sequence of characters, speaker identifier characteristics and/or video characteristics such as the biometric data of the appropriate user.
  • the authorised recipient of a certain audio message can be identified from among other possible users present in the vicinity of the receiving device at the time of reception of the message, before outputting the main part of the audio message.
  • the identifier characteristics can be stored in a memory to which the receiving device has access, and the receiving device comprises a means of identifying the recipient on the basis of these identifier characteristics.
  • a camera observes the persons present in the room, and identifies the face of the recipient with the aid of the biometric data and using known image processing techniques.
  • the device might identify the user acoustically. For example, the audio header might be output, followed by an appropriate prompt. If a user answers, he can be identified as the right user by means of speaker identification. The message is only output once the identity of the user has been successfully verified.
  • the sender of an audio message can be identified by means of identifier characteristics, and corresponding information regarding the sender can be transmitted along with the audio message.
  • identifier characteristics For example in the form of "Message from Bob to Carl", it is possible to check the validity of the sender with the aid of the identifier characteristics.
  • an audio message should be output immediately to the authorised recipient, on account of topicality.
  • the output would be unsuitable, for example when a secret or private message should be output, and the recipient is not alone in the room, or is otherwise occupied and is not able to receive the message. It might be that the recipient is caught up in a conversation or phone-call.
  • a prefened method according to the invention automatically analyses the situation in which an identified recipient is cunently involved, and the audio message is presented to the recipient in a specific form and/or at a specific time depending on the situation. For example, if the recipient is present and not engaged in an absorbing task (such as a telephone conversation), an incoming message can be played immediately. Otherwise the message can be buffered and played as soon as the user enters the room or concludes his task. If an interruption of longer messages is necessary (e.g.
  • a very satisfactory receiving device is realised as a dialog system with the additional ability to receive pictures of its environment by means of a camera or similar device.
  • the identity of the recipient and/or the cunent situation could then be determined by using known image processing techniques.
  • a very easy method of identifying the recipient and/or analysing the cunent situation is to initiate an automatic dialog between the audio messaging system/receiving device and the recipient. For example the device could precede the dialog described above by outputting the audio header "Message for Carl", and then issuing the prompt "Are you ready to receive the message?".
  • the audio messaging system besides the transmitting device located in the vicinity of the sender, also requires a receiving device located in the vicinity of the actual recipient.
  • a suitable transmitting device should comprise at least the following components: - a user interface for collecting a sender's audio message; message analysing means for analysing the audio message for detecting a control information part concerning communication specifications of the audio message, and a main part comprising the effective message which is to be sent to a specific recipient; - an interpreting unit for at least partially interpreting the control information part of the audio message which controls the audio messaging system with respect to communicating of the audio message; a transmitting interface for transmitting at least the main part of the audio message to a receiving device.
  • a suitable receiving device should comprise at least the following components: a receiving interface for receiving an audio message sent by a transmitting device and comprising a control information part concerning communication specifications of the audio message and a main part comprising the effective message sent to a specific recipient; - a user interface for presenting at least the main part of the audio message to the recipient; an interpreting unit for at least partially interpreting the control information part of the audio message which controls the audio messaging system with respect to presentation of the audio message.
  • the transmitting device and/or the receiving device are preferably realised as dialog systems.
  • the transmitting device and receiving device can be constructed identically and can comprise all necessary components for transmitting as well as receiving messages.
  • Dialog systems used for other purposes such as control of other devices can be equipped with appropriate components, so that such a dialog system can be used as transmitting device and/or receiving device for an audio messaging system according to the present invention.
  • the transmitting device and the receiving device comprise part of a dialog system such as that described in DE 102 49 060 Al .
  • the dialog system need only be further equipped with an appropriate message analysing means, an interpreting unit and a transmitter/receiver interface in order to be able to transfer audio messages via a communication network.
  • the message analysing means might be essentially the speech recognition unit already present in this device, supplied with the appropriate vocabulary for detection of the audio header.
  • An interpreting unit for interpreting the control information part of the audio message can preferably be realised as a software routine within the actual dialog control unit, or in a different form of software running on a processor of the dialog system.
  • the interpreting unit must be able to convert the control directives contained in the audio header into control signals, so that the message is sent in the intended manner from the sender's transmitting device to the receiving device of the recipient, or that the received message is presented in the conect manner to the right recipient by the receiving device.
  • Fig. 1 is a schematic diagram showing one embodiment of an audio messaging system according to the invention
  • Fig. 2 is a perspective view of a prefened embodiment of the transmitting and/or receiving device for an audio messaging system according to Figure 1
  • Fig. 3 shows a very simple example for an audio message with a structure according to the invention
  • Fig. 4 is a flow chart which shows a process flow in a transmitting device commencing with user input up to transmission of the audio message.
  • Figure 1 shows a audio messaging system with, for the sake of simplicity, only two devices, namely a transmitting device 2 T in the vicinity of the sender Us, and a receiving device 2 R in the vicinity of a recipient U R , where the transmitting device 2 ⁇ and the receiving device 2 R are connected to each other by means of a network N.
  • the communication network N can be any kind of network, such as a telephone network, a mobile telephony network, the internet, an office intranet or a home- communication network. It is only necessary that the two devices 2 T and 2 R can com ⁇ runicate with each other by means of appropriate interfaces 14.
  • such an audio messaging system 1 comprises a considerably greater number of devices. Any number of devices might be incorporated.
  • a certain message only be sent from one particular device to another device.
  • Such a message can be sent simultaneously to several devices, for instance to send a message from one user to a user group, i.e. to many recipients.
  • the transmitting device 2 T and the receiving device 2 R are generally constructed in the same manner, i.e. they can be used for both receiving and transmitting audio messages.
  • the references 2 x and 2 R only serve to distinguish between receiving device 2 R and transmitting device 2 ⁇ for the sake of clarity.
  • a message can also be transmitted in the opposite direction.
  • transceiving devices 2 ⁇ , 2 R Such a transceiving device 2 ⁇ , 2 R is constructed in an advantageous anangement as a dialog system.
  • a dialog system of this kind comprises, along with other components not shown in the figure, a user interface 10 with an anangement for picking up or collecting audio signals from a user such as speech or singing, by means of a microphone or something similar.
  • This user interface 10 also features an acoustic output anangement 12, such as a loudspeaker.
  • the user interface 10 can comprise components for visual output or input, such as a display and/or a camera. In a prefened embodiment, shown in Fig.
  • the user interface is moveable, for example can rotate about an axis, and mounted on a housing 18, which might contain any further components of the transceiving device 2 j, 2 R .
  • the user interface 10 has a clearly recognisable front aspect 17, comprising a loudspeaker 12, two microphones 11, and a camera 16.
  • this embodiment might comprise a display unit (not shown in the figure) for visual output of information.
  • a prefened dialog system with such a display unit is the home dialog system described in DE 102 49 060 Al, which is incorporated herewith in its entirety.
  • the additional functionality advantageous for the present invention and achieved with such a realisation of the transceiving device 2 T , 2 R is explained at a later point.
  • an audio control unit 8 which, for example, controls the audio functions of the user interface 10 and prepares incoming speech signals for later processing steps.
  • An example of such a later processing step is an automatic speech recognition anangement 7, comprising an actual speech recognition unit 5 followed by a subsequent language understanding unit 6.
  • the incoming speech signals of the user Us can be analysed and recognised in the usual manner, i.e. the underlying meaning of the spoken input can be determined.
  • the speech recognition results are then forwarded to the dialog control unit 3, which controls the actual dialog with the user, and works together with an application - in this case a message transceiving application 12 - in order to send or receive an audio message.
  • This message transceiving application 13 along with a physical network interface 14 connecting to the communication network N, ensures that the message can be sent and received in an appropriate electronic form.
  • the message transceiving application 13 together with the network interface 14 can therefore also be regarded as a "receiving interface” or “transmitting interface” or also as a “transceiving interface” as appropriate.
  • the system also features a prompt generator 9 for generating output prompts.
  • a prompt generator 9 can output pre-generated prompts retrieved from a memory, or can comprise a speech generation unit for converting text prompts into speech signals, which can be output as synthetic speech by means of the audio controller 8 and the user interface 12.
  • An audio message of a sending user U s can be sent to a recipient U R , in this case another individual user, in the following manner:
  • the sender Us speaks the audio message AM which is detected by the user interface 10, or more precisely the audio detection anangement 11, of the transceiving device 2 j.
  • the recorded speech signals are then pre-processed by the audio control unit 8 and forwarded to the kernel of the automatic speech recognition unit 5, which analyses the utterance of the user Us together with the subsequent language understanding unit 6.
  • such an audio message AM comprises a control information part CP (audio header) along with the actual information to be transmitted which is the so-called main part MP. This structure is shown in Figure 3.
  • the message shown here "Private message to Carl: the meeting starts at 7.00pm” contains the control information part CP "Private message to Carl", followed by the main part MP "The meeting starts at 7.00pm".
  • the automatic speech recognition anangement 7 is configured in such a way that it can identify the control information part CP and separate this from the main part MP.
  • the vocabulary of the automatic speech recognition anangement 7 contains certain control words C W, which, if they occur within a certain syntax, will be identified as belonging to a control information part CP of an audio message AM.
  • These control words CW are stored in a memory unit 15 within the receiving device 2 T - Furthermore., this memory unit 15 also stores identifier strings IS, such as nicknames of various users of the audio messaging system which might be possible recipients.
  • a corresponding "buddy list”, containing nicknames of potential recipients and their addresses within the audio messaging system 1, can be assembled by the user of the transmitting device 2 ⁇ .
  • This list can be stored in the transmitting device 2 x or at another location of the audio messaging system 1, for example on a server of a service provider.
  • both main part MP and control information part CP of the audio message AM are passed from the automatic speech recognition anangement 7 to the dialog control module 3, in which an interpreting unit 4, for example in the form of software routines, is installed.
  • This interpreting unit 4 also has access to the control words C ⁇ W and identifier strings IS in the memory 15, and therefore can interpret the control information part CP of the audio message AM in order to generate conesponding control signals for the audio messaging system 1, particularly the transmitting device 2 x, and thus to control the audio messaging system
  • the dialog control unit 3 initiates a dialog by, for example, causing the prompt generator 9 to issue an appropriate prompt to the sender Us, for instance "Are you trying to send a private message to Carl?".
  • the sender Us can answer with a simple "Yes” or “No”, as appropriate, either to confirm a presumed control header CP, or to terminate the procedure in the case of an enoneously detected control header CP.
  • the dialog control unit 3 passes the main part MP, and preferably also the control information part CP, to the message transceiving application 13 and simultaneously passes on any conesponding control signals, so that the audio message AM can be communicated, via the communication network N, to the address of the receiving device 2 R of the user with the nickname "Carl".
  • control information part CP and the main part MP of the audio message AM are then transmitted to the receiving device 2 R via the network interface 14 connected to the communication network N.
  • the sequence of operation within the transmitting device 2 ⁇ is shown in the flow chart of Figure 4. The process commences at step I with the user input. In step
  • step II an appropriate analysis determines whether the user input comprises an audio header CP, whereby the following step III checks to see if all the required parts of an audio header are present and clearly identifiable. Otherwise, step IV initiates a dialog, i.e. questions are put to the user and the answers are analysed until all the required parts of an audio header have been identified.
  • a dialog i.e. questions are put to the user and the answers are analysed until all the required parts of an audio header have been identified.
  • a typical case of misinterpretation might arise with the following: "Private message to Julie: Ann, shall we meet for lunch today?". This message might be interpreted to give the audio header "Private message to Julie" and the main part “Ann, shall we meet for lunch today?" or the audio header "Private message to Julian" and "Shall we meet for lunch today?".
  • step V the audio body, i.e. the main part MP, can be separated from the audio header CP. Subsequently, further processing steps are possible within the dialog. In the example above, the user is asked whether further information is the be sent with the audio message AM, i.e. whether an image or a video is to be transmitted. Other attachments might equally accompany the audio message AM, such as a document. If the user confirms, the processing step VII can determine which image or video is to be attached to the message.
  • step VI Another prompt in step VI can ask whether any more pictures, videos etc. are to be attached.
  • step VIII concludes transmission of the message.
  • the control information part CP and the main part MP of the audio message AM are received over the network interface 14 and processed by the message transceiving application 13 in the device.
  • Output of the message is performed by the dialog control unit 3, if necessary the prompt generator 9, and the audio control unit 8 as well as the loudspeaker 12 of the user interface 10 of the receiving device 2 R .
  • the receiving device 2 R analyses the situation in advance.
  • the moveable user interface (see Fig. 2) might swivel about in order to scan the entire room with t ie aid of the camera 16.
  • the intended recipient T-J R can be identified with the aid of the identifier characteristics IC associated with the various identifier strings IS stored in the memory.
  • the identifier string IS accompanying the message are used by the message transceiving application 12, or a similarly suitable module of the receiving device 2 R , to retrieve the conesponding identifier characteristics IC from the memory 15 and to identify the recipient U R using these identifier characteristics IC.
  • the identifier characteristics IC might be biometric data used in image processing to identify the recipient XJ R from among other persons in the room. Equally, speaker identifications characteristics can be applied.
  • the dialog control unit 3 can ensure that only the audio header CT — "Private message to Carl” — is output via the audio control unit 8 and the user interface 10 of the receiving device 2 R , followed by the supplement "Would you like to listen to the message right away?", generated by the prompt generator 9.
  • the spoken answer can be analysed in turn by the speech recognition unit 5 and the language understanding unit, and simultaneously checked for validity by speaker identification, whereby extracted characteristics are compared with the information characteristics IC in the memory 15, to determine whether the right user and authorised recipient U R is answering.
  • the camera 16 it can be determined whether the user is involved in a conversation with other users, whether he is making a phone call, or is involved in any other situation making him unable to receive the message. If the recipient U R is not in the room, or not able to receive the message AM, the message is buffered and output at a later point in time. If the recipient U R indicates that he would like to listen to the message in privacy, the receiving device 2 T will also buffer the audio message AM and not play it until the recipient U R is alone again in the room, or until the recipient U R has ensured that he will able to privately listen to the audio message AM, for example by wearing headphones or similar.
  • the user interface 10 of the receiving device 2 R advantageously turns to present its front aspect 17 to the authorised recipient of the message, recognised by receiving device 2 R , i.e. the receiving device 2 R rums to directly face the recipient U R. when outputting a dialog prompt or the audio message AM or the main part of the audio message AM.
  • Other advantageous means of outputting or usage of the receiving device 2 R or transceiving device 2 realised in the form of a dialog system, are described in the document DE 102 49 06O Al.
  • a “unit” may comprise a number of blocks or devices, unless explicitly described as a single entity.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Telephonic Communication Services (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention describes a method for sending an audio message (AM) from a sender (US) to a recipient (UR) over an audio messaging system. Thereby, a sender’s (US) audio message is first collected by a transmitting device (2T). The audio message (AM) is then analysed for detection of a control information part (CP) concerning communication specifications of the message (AM) and a main part (MP) comprising the effective message which is to be sent to the recipient (UR). The control information part (CP) of the audio message (AM) is at least partially interpreted for controlling the audio messaging system (1) for communicating the (specific) audio message (AM). At least the main part (MP) of the audio message (AM) is transmitted to a receiving device (3) and presented to the recipient (UR). Furthermore, an appropriate audio messaging system, a transmitting device and a receiving device for such an audio messaging system are described.

Description

METHOD AND SYSTEM FOR SENDING AN AUDIO MESSAGE
This invention relates to a method for sending an audio message from a sender to a recipient over an audio messaging system and to an appropriate audio messaging system. Further the invention relates to a transmitting device and to a receiving device for such an audio messaging system. The popularity of text-based messaging services has increased immensely since their introduction a few years ago. The widespread Short Messaging Service (SMS) is just one example of such a service. Text news systems like AOL's Instant Messenger, Microsoft's MSM Messenger and Yahoo's Messenger for PCs can be used free of charge after downloading the required free software. Some of these PC- based messaging providers offer a voice-chat functionality in addition to the text messaging services. Furthermore, some other providers have specialised in voice chat, ultimately leading to a voice-over-IP (internet protocol) scenario. A notable distinction between voice chat functionality and text messaging is the possibility for the user to interact explicitly, for example by choosing a chat window and typing there or by other actions like writing a word document and sending it. On the other hand, voice interaction is continually transmitted, i.e. an uninterrupted interchange takes place. This is often not what the user really wants, for example when he is in a room with other people and only wishes to transmit specific remarks as messages, whereas the remarks directed by him at the other people in the room generally should not be transmitted. Normal telephony allows the user to circumvent this problem by covering the microphone with his hand or switching the telephone to mute. Evidently, this is not possible when using a hands-free telephone or headset. The recipient of a message has a similar problem — while it is possible to read private messages received using text-based messaging services, even when a third party is in the same room, by reading the messages from a screen or a display which cannot be viewed by the third party, it is next to impossible to ensure that audible messages not be heard by third parties for whom the messages are not intended, unless the messages are listened to through headphones. Text messaging systems do indeed appear to enjoy a greater acceptance level than voice chat functionality. This is probably owing to the tendency that users do not really desire a permanent conversation experience. On the one hand, they want to be able to connect to the other person. On the other hand, they would just as equally like to be connected in an offline mode in which they are not permanently involved in an ongoing conversation in which all their remarks are communicated.
Therefore, an object of the present invention is to provide a method for sending an audio message from a sender to a recipient over an audio messaging system and an appropriate audio messaging system which offers to the user essentially the same experience as text messaging systems. In particular the user should be able to easily send specific utterances as audio messages while excluding other utterances from being sent by the messaging system. To this end, the present invention provides a method for sending an audio message from a sender to a recipient over an audio messaging system comprising the following steps: First, a sender's audio message is collected by a transmitting device. The message is usually generated by the sender speaking the message. Nevertheless it is also possible that the sender generate the message or parts of the message in another form, for example by singing, playing on an instrument, clapping hands etc. This audio message will then be analysed to detect a control information part, also called "audio header" in the following, containing directives such as details of communication specifications of the message; and a main part comprising the effective message or effective information which is to be sent to the recipient, also called "audio body" in the following. The terms "sender" and "receiver" do not necessarily imply individual users, but can mean user groups, a member or all members of such a group. A user group might use a single shared transmitting or receiving device, for example members of a family to whom the device belongs, or employees in an office using a device designated for that office. A user group might also mean a group of users each of whom has his own device, in which case a message destined for the user group will be transmitted to all receiving devices. The communication specifications of the message, incorporated in the control information part, may be any kind of transmission and /or presentation specification like, for example, a message type and/or a sending mode, e.g. information specifying that the message is secret, private, urgent etc. The control information part could also include information for sender identification or for specifying the recipient of the message. For instance, a typical audio header may be "Private message from Bob to Carl". This control information part of the audio message is at least partially interpreted for controlling the audio messaging system for transmitting and/or presenting the specific audio message. For example, a control signal for the transmitting device and/or the receiving device and/or other parts like transceiving stations, router etc. of the audio messaging system may be generated based on the control information part. In a further step, at least the main part of the audio message is sent to a receiving device located in the vicinity of the recipient and is presented there to the recipient. An appropriate audio messaging system for sending an audio message from a sender to a recipient according to this method comprises a transmitting device with a user interface for collecting a sender's audio message and message analysing means for analysing the audio message for detecting a control information part concerning communication specifications of the audio message and a main part comprising the actual message which is to be sent to the recipient. Further, the audio messaging system comprises an interpreting unit for at least partially interpreting the control information part of the audio message for controlling the audio messaging system for communicating the specific audio message. Additionally, the audio messaging system comprises a receiving device with a user interface for presenting at least the main part of the audio message to the recipient. Finally, the audio messaging system requires a means for transmitting at least the main part of the audio message from the transmitting device to the receiving device. With the aid of the method and the audio messaging system according to the present invention, the user controls the audio messaging system by commands embedded in the audio message, thus avoiding a continual transmission of everything he says. In other words, the user can provide the system with "meta-information" in an utterance along with the actual audio content of the message. The system analyses the audio message accordingly and separates the audio header containing the control information from the audio body with any utterances intended for transmission. If the system is unable to detect an audio header with appropriate directions for communicating a message to a particular person in a particular manner, then nothing will be transmitted. This is illustrated in the following simple example: assuming a user of the system says "Message to Carl: the soccer match starts at 7.00pm", this utterance will be picked up by the user interface of the transmitting device and analysed. The audio header "Message to Carl" will be detected and interpreted, and the message "The soccer match starts at 7.00 pm" will be transmitted to a recipient called "Carl". On the other hand, if the user simply informs another person also present in the room about the start time of the match using the remark "Pete, you know the soccer match starts at 7.00 pm", an activated audio messaging system or the conesponding transmitting device would conclude, on analysis of the utterance, that it does not contain an audio header. The utterance would, as a result, not be identified as an audio message and would not be transmitted. Therefore, the invention provides an exceptionally simple and user- friendly means of controlling the system, so that only certain utterances are transmitted by the audio messaging system to other persons, without having to first deactivate the system or parts of the system, for example a microphone or loudspeaker. Furthermore, the sending user can control the system with respect to transmitting the message and presenting it, whereby all control directives can be comfortably included in the message by means of appropriate formulation in an audio header, without the user having to carry out any manual actions. In other words, the entire control of the audio messaging system can be comfortably carried out using a hands-free set. Thereby, such a system offers advantages over the usual speech control for typical mobile telephones, for example in automotive hands-free sets, whereby a connection to another participant can be initiated and controlled using speech commands, but in which a permanent connection is maintained thereafter between the user and the participant. All of the user's utterances are communicated to the other participant, and muting the telephone is only possible by issuing the appropriate command, or by covering the microphone etc. The dependent claims and the subsequent description disclose particularly advantageous embodiments and features of the invention. In a prefened embodiment of the invention, the control information part of the audio message is also at least partially transmitted to the receiving device and interpreted for controlling the presentation of the audio message to the recipient. In other words, the receiving device receives appropriate information, with the aid of the audio header, for example as to when, how and to which user(s) the audio message or the audio body of the audio message is to be output. Preferably, the audio header can also be output at least partially to the recipient. Since the control information part preferably deals with commands spoken by the user, automatic speech recognition techniques can be used to identify the control information part within the audio message, whereby automatic speech recognition in this case does not imply speech recognition in a strict sense, but rather language understanding techniques. To this end, the transmitting device should comprise an automatic speech recognition areangement. To assist the identification of the control information part within the audio message, the audio message is preferably built up in a defined composite structure in which the control information part is positioned at a specific position respective to the main part. More preferably, the control information part is positioned at the beginning of the audio message and followed by the main part. The advantage of this is that the control information part is the first to be detected by the speech recognition anangement, and the following main part need only be buffered or prepared for transmission. The control information part can, however, be located at any suitable position within the message, for example at the end of the message, or the control information part might be distributed over several positions in the message, so that certain control information is located at the start of the message and further control information is located towards the middle or at the end of the message. Analysis of the audio message with the aid of an automatic speech recogniser might involve, for example, searching for certain key- words that might be stored by the audio messaging system in an appropriate memory such as a storage unit in the transmitting device or receiving device. Typical examples of such key-words might be "message", "message to" etc., descriptors for possible recipients of the messages, as well as key- words specifying the type of message or manner of transmission, for example "secret", "private" or "urgent". To make the transmission of messages as easy as possible, unique identifier strings are associated with the possible users or user groups of the audio messaging system. Such a unique identifier string might comprise, for example, the user's real name, or might equally well be any other string concealing the identity of the various users. In particular, entire user groups can be identified collectively using a single string. The use of nicknames or fantasy names which can most easily be recalled by the other users is prefened. These nicknames are included in the system's vocabulary and can be used to efficiently address a fellow user in the audio header by just saying his nickname. Furthermore, groups can be defined where all connected members will receive the message if the audio header contains the name of the group. Preferably, the identifier strings of the possible recipients are stored together with a conesponding address book entries in a memory of the transmitting device and, if need be, in the receiving device or in a further suitable location in the audio messaging system. Audio messages will often be sent to a number of people at the same time. During a longer conversation the same list of recipients will be frequently used. When speaking the audio header, it is inconvenient for a user if all names of all recipients have to be spoken each time. Therefore, dynamically associating nicknames or other identifier strings with the list of relevant address book entries will make the sending of messages more comfortable. Preferably a key-word like "Reply" or similar is used to indicate in the audio header that the associated audio message should be transmitted to the sender of the last message received and possibly to all users to whom the last message was sent. The transmitting device is preferably realised as a dialog system, comprises such a dialog system, or is part of such a dialog system. In this particularly prefened case, an automatic dialog can be initiated between the audio messaging system, or more particularly the transmitting device, and the sender, in order to identify the control information part of the audio message when an ambiguity value (e.g. based on an internal confidence measure) of a recognition result of the automatic speech recogniser reaches or exceeds a certain ambiguity threshold level. In other words, if the system is uncertain as to whether a message should be sent, to whom it should be sent, or in which manner it should be sent, the system can issue a prompt to the user asking for confirmation, or can enter into a dialog with the user to allow conection of a supposed audio header. In this way, the system ensures that no message is sent unintentionally, or sent to the wrong recipient. As already mentioned, the control information part, in a prefened embodiment, is also transmitted at least partially to the receiving device, where it is interpreted to control the output of the audio message. This is particularly useful when information pertaining to identification of the recipient, for example the identifier string, is also transmitted. With the aid of the identifier string, the user can be identified on the part of the receiving device before output of the audio message of the audio body of the audio message takes place. To this end, in a particularly preferred embodiment, the identifier string of a user or a user group is linked to identifier characteristics of the specific user, user group, or members of a user group. The identifier characteristics can be, for example, a secret sequence of characters, speaker identifier characteristics and/or video characteristics such as the biometric data of the appropriate user. With the aid of these identifier characteristics, the authorised recipient of a certain audio message can be identified from among other possible users present in the vicinity of the receiving device at the time of reception of the message, before outputting the main part of the audio message. Preferably, the identifier characteristics can be stored in a memory to which the receiving device has access, and the receiving device comprises a means of identifying the recipient on the basis of these identifier characteristics. One possibility might be that a camera observes the persons present in the room, and identifies the face of the recipient with the aid of the biometric data and using known image processing techniques. Alternatively, the device might identify the user acoustically. For example, the audio header might be output, followed by an appropriate prompt. If a user answers, he can be identified as the right user by means of speaker identification. The message is only output once the identity of the user has been successfully verified. In a prefened embodiment, the sender of an audio message can be identified by means of identifier characteristics, and corresponding information regarding the sender can be transmitted along with the audio message. As long as the sender has identified himself in the audio header, for example in the form of "Message from Bob to Carl", it is possible to check the validity of the sender with the aid of the identifier characteristics. Usually, an audio message should be output immediately to the authorised recipient, on account of topicality. However, there are situations in which the output would be unsuitable, for example when a secret or private message should be output, and the recipient is not alone in the room, or is otherwise occupied and is not able to receive the message. It might be that the recipient is caught up in a conversation or phone-call. Taking account of such situations is particularly important, since an audio message is not enduring. If the user is not in the room or is not paying attention, and the message is output immediately, it would be irretrievably lost. To this end, a prefened method according to the invention automatically analyses the situation in which an identified recipient is cunently involved, and the audio message is presented to the recipient in a specific form and/or at a specific time depending on the situation. For example, if the recipient is present and not engaged in an absorbing task (such as a telephone conversation), an incoming message can be played immediately. Otherwise the message can be buffered and played as soon as the user enters the room or concludes his task. If an interruption of longer messages is necessary (e.g. due to an incoming phone-call) playback can be resumed at a later point in time. There are different methods of automatically analysing the situation in which the recipient is cunently involved. In a prefened embodiment, a very satisfactory receiving device is realised as a dialog system with the additional ability to receive pictures of its environment by means of a camera or similar device. The identity of the recipient and/or the cunent situation could then be determined by using known image processing techniques. A very easy method of identifying the recipient and/or analysing the cunent situation is to initiate an automatic dialog between the audio messaging system/receiving device and the recipient. For example the device could precede the dialog described above by outputting the audio header "Message for Carl", and then issuing the prompt "Are you ready to receive the message?". Should the user reply with "Yes", the message will be presented, otherwise it will be buffered until the user explicitly requests the message at a later time. As already described above, the audio messaging system, besides the transmitting device located in the vicinity of the sender, also requires a receiving device located in the vicinity of the actual recipient. A suitable transmitting device should comprise at least the following components: - a user interface for collecting a sender's audio message; message analysing means for analysing the audio message for detecting a control information part concerning communication specifications of the audio message, and a main part comprising the effective message which is to be sent to a specific recipient; - an interpreting unit for at least partially interpreting the control information part of the audio message which controls the audio messaging system with respect to communicating of the audio message; a transmitting interface for transmitting at least the main part of the audio message to a receiving device. A suitable receiving device should comprise at least the following components: a receiving interface for receiving an audio message sent by a transmitting device and comprising a control information part concerning communication specifications of the audio message and a main part comprising the effective message sent to a specific recipient; - a user interface for presenting at least the main part of the audio message to the recipient; an interpreting unit for at least partially interpreting the control information part of the audio message which controls the audio messaging system with respect to presentation of the audio message. As already explained above, the transmitting device and/or the receiving device are preferably realised as dialog systems. The transmitting device and receiving device can be constructed identically and can comprise all necessary components for transmitting as well as receiving messages. Dialog systems used for other purposes such as control of other devices can be equipped with appropriate components, so that such a dialog system can be used as transmitting device and/or receiving device for an audio messaging system according to the present invention. In an especially prefened embodiment, the transmitting device and the receiving device comprise part of a dialog system such as that described in DE 102 49 060 Al . In this case, the dialog system need only be further equipped with an appropriate message analysing means, an interpreting unit and a transmitter/receiver interface in order to be able to transfer audio messages via a communication network. The message analysing means might be essentially the speech recognition unit already present in this device, supplied with the appropriate vocabulary for detection of the audio header. An interpreting unit for interpreting the control information part of the audio message can preferably be realised as a software routine within the actual dialog control unit, or in a different form of software running on a processor of the dialog system. The interpreting unit must be able to convert the control directives contained in the audio header into control signals, so that the message is sent in the intended manner from the sender's transmitting device to the receiving device of the recipient, or that the received message is presented in the conect manner to the right recipient by the receiving device. Other objects and features of the present invention will become apparent from the following detailed descriptions considered in conjunction with the accompanying drawings. It is to be understood, however, that the drawings are designed solely for t-he purposes of illustration and not as a definition of the limits of the invention.
Fig. 1 is a schematic diagram showing one embodiment of an audio messaging system according to the invention; Fig. 2 is a perspective view of a prefened embodiment of the transmitting and/or receiving device for an audio messaging system according to Figure 1 ; Fig. 3 shows a very simple example for an audio message with a structure according to the invention; Fig. 4 is a flow chart which shows a process flow in a transmitting device commencing with user input up to transmission of the audio message.
Figure 1 shows a audio messaging system with, for the sake of simplicity, only two devices, namely a transmitting device 2 T in the vicinity of the sender Us, and a receiving device 2R in the vicinity of a recipient UR, where the transmitting device 2χ and the receiving device 2R are connected to each other by means of a network N. The communication network N can be any kind of network, such as a telephone network, a mobile telephony network, the internet, an office intranet or a home- communication network. It is only necessary that the two devices 2 T and 2R can comπrunicate with each other by means of appropriate interfaces 14. Generally, such an audio messaging system 1 comprises a considerably greater number of devices. Any number of devices might be incorporated. In particular, it is not necessary that a certain message only be sent from one particular device to another device. Such a message can be sent simultaneously to several devices, for instance to send a message from one user to a user group, i.e. to many recipients. In the example shown, the transmitting device 2 T and the receiving device 2R are generally constructed in the same manner, i.e. they can be used for both receiving and transmitting audio messages. The references 2 x and 2R only serve to distinguish between receiving device 2R and transmitting device 2 τ for the sake of clarity. In general, a message can also be transmitted in the opposite direction.
Therefore, to simplify matters, the devices will also be refened to as "transceiving devices" 2χ, 2R where appropriate. Such a transceiving device 2 τ, 2R is constructed in an advantageous anangement as a dialog system. A dialog system of this kind comprises, along with other components not shown in the figure, a user interface 10 with an anangement for picking up or collecting audio signals from a user such as speech or singing, by means of a microphone or something similar. This user interface 10 also features an acoustic output anangement 12, such as a loudspeaker. Furthermore, the user interface 10 can comprise components for visual output or input, such as a display and/or a camera. In a prefened embodiment, shown in Fig. 2, the user interface is moveable, for example can rotate about an axis, and mounted on a housing 18, which might contain any further components of the transceiving device 2 j, 2R. The user interface 10 has a clearly recognisable front aspect 17, comprising a loudspeaker 12, two microphones 11, and a camera 16. Furthermore, this embodiment might comprise a display unit (not shown in the figure) for visual output of information. A prefened dialog system with such a display unit is the home dialog system described in DE 102 49 060 Al, which is incorporated herewith in its entirety. The additional functionality advantageous for the present invention and achieved with such a realisation of the transceiving device 2 T, 2R, is explained at a later point. Further components of the transceiving device 2T, 2R are an audio control unit 8, which, for example, controls the audio functions of the user interface 10 and prepares incoming speech signals for later processing steps. An example of such a later processing step is an automatic speech recognition anangement 7, comprising an actual speech recognition unit 5 followed by a subsequent language understanding unit 6. With the aid of these components, the incoming speech signals of the user Us can be analysed and recognised in the usual manner, i.e. the underlying meaning of the spoken input can be determined. The speech recognition results are then forwarded to the dialog control unit 3, which controls the actual dialog with the user, and works together with an application - in this case a message transceiving application 12 - in order to send or receive an audio message. This message transceiving application 13, along with a physical network interface 14 connecting to the communication network N, ensures that the message can be sent and received in an appropriate electronic form. The message transceiving application 13 together with the network interface 14 can therefore also be regarded as a "receiving interface" or "transmitting interface" or also as a "transceiving interface" as appropriate. Since output to the user is necessary to allow a dialog with the user Us, UR, the system also features a prompt generator 9 for generating output prompts. Such a prompt generator 9 can output pre-generated prompts retrieved from a memory, or can comprise a speech generation unit for converting text prompts into speech signals, which can be output as synthetic speech by means of the audio controller 8 and the user interface 12. An audio message of a sending user Us can be sent to a recipient UR, in this case another individual user, in the following manner: The sender Us speaks the audio message AM which is detected by the user interface 10, or more precisely the audio detection anangement 11, of the transceiving device 2 j. The recorded speech signals are then pre-processed by the audio control unit 8 and forwarded to the kernel of the automatic speech recognition unit 5, which analyses the utterance of the user Us together with the subsequent language understanding unit 6. According to the invention, such an audio message AM comprises a control information part CP (audio header) along with the actual information to be transmitted which is the so-called main part MP. This structure is shown in Figure 3. The message shown here "Private message to Carl: the meeting starts at 7.00pm" contains the control information part CP "Private message to Carl", followed by the main part MP "The meeting starts at 7.00pm". The automatic speech recognition anangement 7 is configured in such a way that it can identify the control information part CP and separate this from the main part MP. To this end, the vocabulary of the automatic speech recognition anangement 7 contains certain control words C W, which, if they occur within a certain syntax, will be identified as belonging to a control information part CP of an audio message AM. These control words CW are stored in a memory unit 15 within the receiving device 2 T- Furthermore., this memory unit 15 also stores identifier strings IS, such as nicknames of various users of the audio messaging system which might be possible recipients. A corresponding "buddy list", containing nicknames of potential recipients and their addresses within the audio messaging system 1, can be assembled by the user of the transmitting device 2 τ . This list can be stored in the transmitting device 2 x or at another location of the audio messaging system 1, for example on a server of a service provider. In the example shown in the figures, both main part MP and control information part CP of the audio message AM are passed from the automatic speech recognition anangement 7 to the dialog control module 3, in which an interpreting unit 4, for example in the form of software routines, is installed. This interpreting unit 4 also has access to the control words C~W and identifier strings IS in the memory 15, and therefore can interpret the control information part CP of the audio message AM in order to generate conesponding control signals for the audio messaging system 1, particularly the transmitting device 2 x, and thus to control the audio messaging system
I, particularly the transmitting device 2 T, accordingly. If the control information part CP is not clearly identifiable, the dialog control unit 3 initiates a dialog by, for example, causing the prompt generator 9 to issue an appropriate prompt to the sender Us, for instance "Are you trying to send a private message to Carl?". The sender Us can answer with a simple "Yes" or "No", as appropriate, either to confirm a presumed control header CP, or to terminate the procedure in the case of an enoneously detected control header CP. If the system has ascertained that a control header has been conectly identified, or if the user has confirmed a presumed control header through an ensuing dialog, the main part MP of the audio message AM, attached to the audio header CP, is sent to the recipient UR specified in the audio header CP by means of the identifier string IS, which in the case of the preceding example is the user with the nickname "Carl". To this end, the dialog control unit 3 passes the main part MP, and preferably also the control information part CP, to the message transceiving application 13 and simultaneously passes on any conesponding control signals, so that the audio message AM can be communicated, via the communication network N, to the address of the receiving device 2R of the user with the nickname "Carl". The control information part CP and the main part MP of the audio message AM are then transmitted to the receiving device 2R via the network interface 14 connected to the communication network N. The sequence of operation within the transmitting device 2χ is shown in the flow chart of Figure 4. The process commences at step I with the user input. In step
II, an appropriate analysis determines whether the user input comprises an audio header CP, whereby the following step III checks to see if all the required parts of an audio header are present and clearly identifiable. Otherwise, step IV initiates a dialog, i.e. questions are put to the user and the answers are analysed until all the required parts of an audio header have been identified. A typical case of misinterpretation might arise with the following: "Private message to Julie: Ann, shall we meet for lunch today?". This message might be interpreted to give the audio header "Private message to Julie" and the main part "Ann, shall we meet for lunch today?" or the audio header "Private message to Julian" and "Shall we meet for lunch today?". In this case the system may prompt "Did you want to send a private message to Julian?" The sender Us can reply "No, I wanted to send a private message to Julie". Here, the answer clarifies the misinterpretation by specifying the first of t ie possible alternatives. In step V, the audio body, i.e. the main part MP, can be separated from the audio header CP. Subsequently, further processing steps are possible within the dialog. In the example above, the user is asked whether further information is the be sent with the audio message AM, i.e. whether an image or a video is to be transmitted. Other attachments might equally accompany the audio message AM, such as a document. If the user confirms, the processing step VII can determine which image or video is to be attached to the message. Another prompt in step VI can ask whether any more pictures, videos etc. are to be attached. Once the message is complete, step VIII concludes transmission of the message. At the receiving device 2R, the control information part CP and the main part MP of the audio message AM are received over the network interface 14 and processed by the message transceiving application 13 in the device. Output of the message is performed by the dialog control unit 3, if necessary the prompt generator 9, and the audio control unit 8 as well as the loudspeaker 12 of the user interface 10 of the receiving device 2R. To avoid output of the message if the intended recipient UR is not in the room, is otherwise occupied at the time, or is in the company of other persons for whom the contents of the message are not intended, the receiving device 2R analyses the situation in advance. For example, the moveable user interface (see Fig. 2) might swivel about in order to scan the entire room with t ie aid of the camera 16. Using known image processing techniques, it can be determined whether the intended recipient UR is present in the room. The intended recipient T-JR can be identified with the aid of the identifier characteristics IC associated with the various identifier strings IS stored in the memory. To this end, the identifier string IS accompanying the message are used by the message transceiving application 12, or a similarly suitable module of the receiving device 2R, to retrieve the conesponding identifier characteristics IC from the memory 15 and to identify the recipient UR using these identifier characteristics IC. The identifier characteristics IC might be biometric data used in image processing to identify the recipient XJR from among other persons in the room. Equally, speaker identifications characteristics can be applied. In this case for example, the dialog control unit 3 can ensure that only the audio header CT — "Private message to Carl" — is output via the audio control unit 8 and the user interface 10 of the receiving device 2R, followed by the supplement "Would you like to listen to the message right away?", generated by the prompt generator 9. When the user thus addressed replies, the spoken answer can be analysed in turn by the speech recognition unit 5 and the language understanding unit, and simultaneously checked for validity by speaker identification, whereby extracted characteristics are compared with the information characteristics IC in the memory 15, to determine whether the right user and authorised recipient UR is answering. Furthermore, with the aid of the camera 16 and usual image processing techniques, it can be determined whether the user is involved in a conversation with other users, whether he is making a phone call, or is involved in any other situation making him unable to receive the message. If the recipient UR is not in the room, or not able to receive the message AM, the message is buffered and output at a later point in time. If the recipient UR indicates that he would like to listen to the message in privacy, the receiving device 2T will also buffer the audio message AM and not play it until the recipient UR is alone again in the room, or until the recipient UR has ensured that he will able to privately listen to the audio message AM, for example by wearing headphones or similar. The user interface 10 of the receiving device 2R advantageously turns to present its front aspect 17 to the authorised recipient of the message, recognised by receiving device 2R, i.e. the receiving device 2R rums to directly face the recipient UR. when outputting a dialog prompt or the audio message AM or the main part of the audio message AM. Other advantageous means of outputting or usage of the receiving device 2R or transceiving device 2 , realised in the form of a dialog system, are described in the document DE 102 49 06O Al. Although the present invention has been disclosed in the form of prefened embodiments and variations thereon, it will be understood that numerous additional modifications and variations could be made thereto without departing from the scope of the invention. In particular, the transmitting device and/or the receiving device might, for example, be constructed using a different architecture than that described. For the sake of clarity, it is also to be understood that the use of "a" or
"an" throughout this application does not exclude a plurality, and "comprising" does not exclude other steps or elements. A "unit" may comprise a number of blocks or devices, unless explicitly described as a single entity.

Claims

CLAIMS:
1. A method for sending an audio message (AM) from a sender (Us) to a recipient (UR) over an audio messaging system, comprising the following steps:
- collecting a sender's (Us) audio message using a transmitting device (2χ);
- analysing the audio message (AM) for detecting a control information part (CP) concerning communication specifications of the message (AM) and a ain part (MP) comprising the effective message which is to be sent to the recipient (UR), where the control information part (CP) of the audio message (AM) is at least partially interpreted for controlling the audio messaging system (1) for communicating the (specific) audio message (AM);
- transmitting at least the main part (MP) of the audio message (AM) to a receiving device (3); - presenting at least the main part (MP) of the audio message (AM) to the recipient (UR).
2. A method according to claim 1, where the control information part (CP) of the audio message (AM) is at least partially transmitted to the receiving device (3) and interpreted for controlling the presentation of the audio message (AM) to the recipient (UR).
3. A method according to claim 1 or 2, where the control information part (CP) of the audio message (A-M) is at least partially presented to the recipient (UR) .
4. A method according to any of claims 1 to 3, where the audio message (AM) is built up in a defined composite structure in which the control information part (CP) is positioned at a specific position respective to the main part (MP).
5. A method according to any of claims 1 to 4, where the control information part (CP) is identified in the audio message by using automatic speech recognition techniques.
6. A method according to claim 5, where an automatic dialog between the audio messaging system (1) and the sender is initiated to identify the control information part (CP) of the audio message (AM), if an ambiguity value of a recognition result of a automatic speech recognition anangement (7) reaches or exceeds a certain ambiguity limit.
7. A method according to any of claims 1 to 6, where unique identifier strings (IS) are associated with possible users or user groups of the audio messaging system and the control information part (CP) of the audio message (AM) comprises an identifier string (IS) associated with the recipient (UR) of this audio message (AM).
8. A method according to any of claims 1 to 7, where an identifier string (IS) of a user or user group is associated with identifier characteristics (IC) of the user or of the user group and/or of different members of the user group.
9. A method according to claim 8, where an authorised recipient (UR) of the audio message (A-.M) is identified based on the identifier characteristics (IC) before presenting the main part (MP) of the audio message.
10. A method according to claim 8 or 9, where the sender (Us) of the audio message (AM) is identified based on the identifier characteristics (IC).
11. A method according to any of claims 1 to 10, where a situation in which an identified recipient (UR) is cunently involved is automatically analysed and the audio message (A-M) is presented to the recipient (UR) in a specific form and/or at a specific time depending of the situation.
12. A method according to claim 10 or 11, where an automatic dialog between the audio messaging system (1) and the recipient (UR) is initiated to identify the recipient (UR) and/or to analyse the cunent situation.
13. A method according to any of claims 1 to 12, where at least the main part (MP) of the audio message (AM) is presented to the recipient over a user interface (10) which comprises an automatically directable front aspect (17) which is directed to face the recipient during presentation of the message.
14. An audio messaging system (1) for sending an audio message (AM) from a sender (Us) to a recipient (UR) comprising: a transmitting device (2T)with a user interface (10) for collecting a sender's (Us) audio message (AM); a message analysing means (7) for analysing the audio message for detection of a control information part (CP) concerning communication specifications of the audio message (AM) and a main part (MP) comprising the effective message which is to be sent to the recipient (UR); an interpreting unit (4) for at least partially interpreting the control information part (CP) of the audio message (AM) for controlling the audio messaging system (1) for communicating the (specific) audio message (AM); a receiving device (2R) with a user interface (10) for presenting at least the main part (MP) of the audio message (AM) to the recipient (UR); means for transmitting (13, 13, N) at least the main part (MP) of the audio message (AM) from the transmitting device(2χ) to the receiving device (2R).
15. A transmitting device (2T) for an audio messaging system (1) according to claim 14 comprising: a user interface (10) for collecting a sender's (Us) audio message (AM), - message analysing means (7) for analysing the audio message (AM) for detecting a control information part (CP) concerning communication specifications of the audio message and a main part (MP) comprising the effective message which is to be sent to a specific recipient (UR),
- an interpreting unit (4) for at least partially interpreting the control information part (CP) of the audio message (AM) for controlling the audio messaging system (1) for communicating the audio message (AM),
- and a transmitting interface (13,14) for transmitting at least the main part (MP) of the audio message (AM) to a receiving device (2R).
16. A receiving device (2R) for an audio messaging system according to claim 14 comprising: a receiving interface (13,14) for receiving an audio message (AM) which is sent by a transmitting device (2R) and which audio message (AM) comprises a control information part (CP) concerning communication specifications of the audio message (AM) and a main part (MP) comprising the effective message which is to be sent to a specific recipient (UR),
- a user interface (10) for presenting at least the main part of the audio message to the recipient,
- and an interpreting unit (4) for at least partially interpreting the control information part (CP) of the audio message (AM) for controlling the audio messaging system (1) for presenting the audio message (AM).
PCT/IB2005/051156 2004-04-13 2005-04-08 Method and system for sending an audio message Ceased WO2005101259A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2007507894A JP2007533236A (en) 2004-04-13 2005-04-08 Method and system for sending voice messages
EP05718667A EP1738277A1 (en) 2004-04-13 2005-04-08 Method and system for sending an audio message

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP04101495 2004-04-13
EP04101495.2 2004-04-13

Publications (1)

Publication Number Publication Date
WO2005101259A1 true WO2005101259A1 (en) 2005-10-27

Family

ID=34963001

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2005/051156 Ceased WO2005101259A1 (en) 2004-04-13 2005-04-08 Method and system for sending an audio message

Country Status (5)

Country Link
EP (1) EP1738277A1 (en)
JP (1) JP2007533236A (en)
KR (1) KR20060133002A (en)
CN (1) CN1943191A (en)
WO (1) WO2005101259A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000019408A1 (en) * 1998-09-30 2000-04-06 Lernout & Hauspie Speech Products N.V. Voice command navigation of electronic mail reader
WO2001084764A2 (en) * 2000-05-04 2001-11-08 Microsoft Corporation Transmitting information given constrained resources
EP1191752A1 (en) * 2000-09-26 2002-03-27 Daniel Gens Method and device for information exchange
WO2002051114A1 (en) * 2000-12-18 2002-06-27 Agentai, Inc. Service request processing performed by artificial intelligence systems in conjunction with human intervention
US20030200096A1 (en) * 2002-04-18 2003-10-23 Masafumi Asai Communication device, communication method, and vehicle-mounted navigation apparatus
WO2003096171A1 (en) * 2002-05-14 2003-11-20 Philips Intellectual Property & Standards Gmbh Dialog control for an electric apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2000019408A1 (en) * 1998-09-30 2000-04-06 Lernout & Hauspie Speech Products N.V. Voice command navigation of electronic mail reader
WO2001084764A2 (en) * 2000-05-04 2001-11-08 Microsoft Corporation Transmitting information given constrained resources
EP1191752A1 (en) * 2000-09-26 2002-03-27 Daniel Gens Method and device for information exchange
WO2002051114A1 (en) * 2000-12-18 2002-06-27 Agentai, Inc. Service request processing performed by artificial intelligence systems in conjunction with human intervention
US20030200096A1 (en) * 2002-04-18 2003-10-23 Masafumi Asai Communication device, communication method, and vehicle-mounted navigation apparatus
WO2003096171A1 (en) * 2002-05-14 2003-11-20 Philips Intellectual Property & Standards Gmbh Dialog control for an electric apparatus

Also Published As

Publication number Publication date
CN1943191A (en) 2007-04-04
JP2007533236A (en) 2007-11-15
EP1738277A1 (en) 2007-01-03
KR20060133002A (en) 2006-12-22

Similar Documents

Publication Publication Date Title
US9948772B2 (en) Configurable phone with interactive voice response engine
US9485347B2 (en) Voice-operated interface for DTMF-controlled systems
CA2648617C (en) Hosted voice recognition system for wireless devices
CN101617303B (en) Wireless Server-Based Text-to-Voice Email
US20080126491A1 (en) Method for Transmitting Messages from a Sender to a Recipient, a Messaging System and Message Converting Means
US20010040886A1 (en) Methods and apparatus for forwarding audio content using an audio web retrieval telephone system
US20070174388A1 (en) Integrated voice mail and email system
US9154620B2 (en) Method and system of voice carry over for instant messaging relay services
WO2001035615A2 (en) Telephone based access to instant messaging
US6532230B1 (en) Mixed-media communication apparatus and method
AU2009202640A1 (en) Telephone for sending voice and text messages
US20240305707A1 (en) Systems and methods for cellular and landline text-to-audio and audio-to-text conversion
KR100450319B1 (en) Apparatus and Method for Communication with Reality in Virtual Environments
KR100312436B1 (en) System and method for e-mail service using telephone
EP1738277A1 (en) Method and system for sending an audio message
KR20040093510A (en) Method to transmit voice message using short message service
JPH10303970A (en) Reply mail providing method and system
CN113194021B (en) Electronic device, message play control system and message play control method
KR20020036009A (en) Method for transmitting and receiving sound data through network and computer-readable medium thereof
US20080086565A1 (en) Voice messaging feature provided for immediate electronic communications
JP2008005028A (en) Audiovisual conference system and terminal device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2005718667

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1020067021037

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 200580011084.8

Country of ref document: CN

Ref document number: 2007507894

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

WWP Wipo information: published in national office

Ref document number: 1020067021037

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 2005718667

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 2005718667

Country of ref document: EP