WO2015043529A1

WO2015043529A1 - Method and system for replying to social application information

Info

Publication number: WO2015043529A1
Application number: PCT/CN2014/087735
Authority: WO
Inventors: Maowei Yang
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2013-09-29
Filing date: 2014-09-29
Publication date: 2015-04-02
Anticipated expiration: 2016-03-29
Also published as: CN104518951B; CN104518951A

Abstract

A server with processor (s) and memory receives a first instruction from a user to initiate a recording process for sending a voice message to a recipient. During the recording process, the server : obtains at least a first portion of the voice message from the user; performs speech-to-text processing to convert the first portion of the voice message to a first text portion; provides the first text portion to the user for display; and, after providing the first text portion to the user, receives a second instruction from the user to terminate the recording process. In response to receiving the second instruction, the server determines a recipient preference for the voice message and/or a result of the speech-to-text processing. In accordance with a determination that the recipient preference is for the result of the speech-to-text processing, the server provides the first text portion of the voice message to the recipient.

Description

METHOD AND SYSTEM FOR REPLYING TO SOCIAL APPLICATION INFORMATION

PRIORITY CLAIM AND RELATED APPLICATIONS

This application claims priority to Chinese Patent Application No. 2013104618437, entitled “Method and Apparatus for Replaying to Instant Messages, ” filed on September 29, 2013, which is incorporated by reference in its entirety.

FIELD OF THE TECHNOLOGY

The present disclosed technology relates to the field of electronic technologies, and in particular, to a method and system for replying to social application information.

BACKGROUND OF THE TECHNOLOGY

With the rapid development of social application technologies, social applications are widely applied in everyday use. At present, the prior art provides a method for replying to social application information (e.g. , messages and posts) , whereby a developer or operator of a social application responds to social application information from a user of the social application by logging into the social application, checking the social application information sent by the user, and then manually replying to the user. This method of manually replying to users is very inefficient from the developer’s perspective.

SUMMARY

The embodiments of the present disclosure provide methods and systems for disseminating messages (e.g. , voice messages and/or converted text from the voice messages) and replying to messages in a social networking platform. In particular, the message dissemination may be tailored to a processing preference of public accounts that are may respond to user questions and inquiries received over a social network platform using automated response logic. In some usage scenarios, the message dissemination may also take into a personal or situational preference of real human users for messages received over a social network platform.

In some embodiments, a method of disseminating messages in a social networking platform is performed at a server system (e.g. , server system 108, Figures 1-2) with one or more processors and memory. The method includes receiving a first instruction from a user in the social networking platform to initiate a recording process for sending a voice message to a recipient in the social networking platform. During the recording process, the method includes: obtaining at least a first portion of the voice message from the user； performing speech-to-text processing to convert the first one portion of the voice message to a first text portion； providing the first text portion to the user for display at a client device corresponding to the user； and, after providing the first text portion to the user, receiving a second instruction from the user to terminate the recording process. In response to receiving the second instruction from the user to terminate the recording process, the method includes determining a recipient preference for at least one of the voice message and a result of the speech-to-text processing. In accordance with a determination that the recipient preference is for the result of the speech-to-text processing, the method includes providing at least the first text portion corresponding to the voice message to the recipient. The server provides the voice message to the recipient without the speech-to-text result, if the determined recipient preference is for the voice message only. In some embodiments, the server may also provide both the voice message and the speech-to-text result to the recipient, if the determined recipient preference is for both.

In some embodiments, a computer system (e.g. , server system 108 (Figures 1-2) , client device 104 (Figures 1 and 3) , or a combination thereof) includes one or more processors and memory storing one or more programs for execution by the one or more processors, the one or more programs include instructions for performing, or controlling performance of, the operations of any of the methods described herein. In some embodiments, a non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which, when executed by a computer system (e.g. , server system 108 (Figures 1-2) , client device 104 (Figures 1 and 3) , or a combination thereof) with one or more processors, cause the computer system to perform, or control performance of, the operations of any of the methods described herein. In some embodiments, a computer system (e.g. , server system 108 (Figures 1-2) , client device 104 (Figures 1 and 3) , or a combination thereof) includes means for performing, or controlling performance of, the operations of any of the methods described herein.

Various advantages of the present application are apparent in light of the descriptions below.

BRIEF DESCRIPTION OF THE DRAWINGS

The aforementioned features and advantages of the disclosed technology as well as additional features and advantages thereof will be more clearly understood hereinafter as a result of a detailed description of preferred embodiments when taken in conjunction with the drawings.

To describe the technical solutions in the embodiments of the present disclosed technology or in the prior art more clearly, the following briefly introduces the accompanying drawings required for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosed technology, and persons of ordinary skill in the art may still derive other drawings from these accompanying drawings without creative efforts.

Figure 1 is a block diagram of a server-client environment in accordance with some embodiments.

Figure 2 is a block diagram of a server system in accordance with some embodiments.

Figure 3 is a block diagram of a client device in accordance with some embodiments.

Figure 4 is a block diagram of an external service in accordance with some embodiments.

Figures 5A-5I illustrate exemplary user interfaces for generating a voice message in a social networking platform in accordance with some embodiments.

Figure 6 illustrates a flowchart diagram of a method of replying to social application information in accordance with some embodiments.

Figure 7 illustrates a flowchart diagram of a method of replying to social application information in accordance with some embodiments.

Figure 8 illustrates a flowchart diagram of a method of replying to social application information in accordance with some embodiments.

Figure 9A illustrates a flowchart diagram of a method of replying to social application information in accordance with some embodiments.

Figure 9B illustrates a flowchart diagram of a method of replying to social application information in accordance with some embodiments.

Figures 10A-10C illustrate a flowchart diagram of a method of disseminating messages in a social networking platform with some embodiments.

Figure 11 illustrate block diagrams of a server-side module in accordance with some embodiments.

Like reference numerals refer to corresponding parts throughout the serveral views of the drawings.

DESCRIPTION OF EMBODIMENTS

Reference will now be made in detail to embodiments, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the subject matter presented herein. But it will be apparent to one skilled in the art that the subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.

The following clearly and completely describes the technical solutions in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application. Apparently, the described embodiments are merely a part rather than all of the embodiments of the present application. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present application without creative efforts shall fall within the protection scope of the present application.

As shown in Figure 1, data processing for a social networking platform or other application is implemented in a server-client environment 100 in accordance with some embodiments. In accordance with some embodiments, server-client environment 100 includes client-side processing 102-1, 102-2 (hereinafter “client-side modules 102” ) executed on a client device 104-1, 104-2, and server-side processing 106 (hereinafter “server-side module 106” ) executed on a server system 108. Client-side module 102 communicates with server-side module 106 through one or more networks 110. Client-side module 102 provides client-side functionalities for the social networking platform (e.g. , communications, payment processing, etc. ) and communications with server-side module 106. Server-side module 106 provides server-side functionalities for the social networking platform (e.g. , communications, payment processing, user authentication, etc. ) for any number of client modules 102 each residing on a respective client device 104.

In some embodiments, server-side module 106 includes one or more processors 112, messages database 114, profiles database 116, an I/O interface to one or more clients 118, and an I/O interface to one or more external services 120. I/O interface to one or more clients 118 facilitates the client-facing input and output processing for server-side module 106. One or more processors 112 perform speech-to-text (STT) processing on voice messages received from users of the social networking platform and, in some circumstances, provide the text result of the STT processing to the recipients of the voice messages according to a preference of the recipients. Messages database 114 stores text and voice messages sent by users in the social networking platform and, in some circumstances, converted text corresponding to the voice messages. Profiles database 116 stores a user profile for each user or external service 122 associated with the social networking platform. I/O interface to one or more external services 120 facilitates communications with one or more external services 122. For example, a respective external service 122 corresponds to an application server for a weather application, a car/taxi service application, a movie times application, a banking application, a bot-chat application, or other similar application.

Examples of client device 104 include, but are not limited to, a handheld computer, a wearable computing device, a personal digital assistant (PDA) , a tablet computer, a laptop computer, a desktop computer, a cellular telephone, a smart phone, an enhanced general packet radio service (EGPRS) mobile phone, a media player, a navigation device, a game console, a television, a remote control, or a combination of any two or more of these data processing devices or other data processing devices.

Examples of one or more networks 110 include local area networks (LAN) and wide area networks (WAN) such as the Internet. One or more networks 110 are, optionally, implemented using any known network protocol, including various wired or wireless protocols, such as Ethernet, Universal Serial Bus (USB) , FIREWIRE, Global System for Mobile Communications (GSM) , Enhanced Data GSM Environment (EDGE) , code division multiple access (CDMA) , time division multiple access (TDMA) , Bluetooth, Wi-Fi, voice over Internet Protocol (VoIP) , Wi-MAX, or any other suitable communication protocol.

Server system 108 is implemented on one or more standalone data processing apparatuses or a distributed network of computers. In some embodiments, server system 108 also employs various virtual devices and/or services of third party service providers (e.g. , third-party cloud service providers) to provide the underlying computing resources and/or infrastructure resources of server system 108.

Server-client environment 100 shown in Figure 1 includes both a client-side portion (e.g. , client-side module 102) and a server-side portion (e.g. , server-side module 106) . In some embodiments, data processing is implemented as a standalone application installed on client device 104. In addition, the division of functionalities between the client and server portions of client environment data processing can vary in different embodiments. For example, in some embodiments, client-side module 102 is a thin-client that provides only user-facing input and output processing functions, and delegates all other data processing functionalities to a backend server (e.g. , server system 108) . In another example, client-side module 102 performs STT processing and a backend server (e.g. , server system 108) performs other functions of the social networking platform (e.g. , communications routing/handling and payment processing) .

Figure 2 is a block diagram illustrating server system 108 in accordance with some embodiments. Server system 108, typically, includes one or more processing units (CPUs) 112, one or more network interfaces 204 (e.g. , including I/O interface to one or more clients 118 and I/O interface to one or more external services 120) , memory 206, and one or more communication buses 208 for interconnecting these components (sometimes called a chipset) . Memory 206 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices； and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. Memory 206, optionally, includes one or more storage devices remotely located from one or more processing units 112. Memory 206, or alternatively the non-volatile memory within memory 206, includes a non-transitory computer readable storage medium. In some implementations, memory 206, or the non-transitory computer readable storage medium of memory 206, stores the following programs, modules, and data structures, or a subset or superset thereof:

·operating system 210 including procedures for handling various basic system services and for performing hardware dependent tasks；

·network communication module 212 for connecting server system 108 to other computing devices (e.g. , client devices 104 and external service (s) 122) connected to one or more networks 110 via one or more network interfaces 204 (wired or wireless) ；

·server-side module 106, which provides server-side data processing for a social networking platform (e.g. , communications, payment processing, user authentication, etc. ) , includes, but is not limited to:

opayment processing module 214 for processing transactions for a respective user of the social networking platform based on payment data in a user profile in profiles database 116 corresponding to the respective user； and

ocommunications module 220 for managing and routing messages sent between users of the social networking platform, including but not limited to；

■message handling module 222 for receiving messages from a sender and sending the messages to their intended recipient (s) ；

■instruction handling module 224 for at least receiving a first instruction from a user (i.e. , the sender) in the social networking platform to initiate a recording process in order to send a voice message to a recipient in the social networking platform and a second instruction from the user to terminate the recording process and send to the recipient the voice message and/or converted text corresponding to the voice message；

■obtaining/streaming module 226 for obtaining at least a portion of the voice message in response to receiving the first instruction from the user to initiate the recording process and, optionally, for establishing a connection with the sender of the voice message so as to stream, in real-time, the voice message to server system 108 in response to receiving the first instruction from the user to initiate the recording process；

■speech-to-text (STT) module 228 for performing STT processing on the obtained voice message to convert the voice message to text；

■providing module 230 for providing the result of the STT processing (e.g. , converted text corresponding to the voice message) to the user (i.e. , the sender) for presentation in real-time；

■terminating module 232 for terminating the recording process in response to receiving the second instruction from the user to terminate the recording process；

■STT tuning module 234 for adjusting parameters of STT module 228 in response to obtaining STT processing results from external service (s) 122；

■determining module 236 for determining a recipient preference in a user profile of the recipient for at least one of the voice message and a result of the speech-to-text processing after terminating the recording process；

■sending module 238 for sending at least one of the voice message and a result of the speech-to-text processing to the recipient according to the determination by determining module 236； and

■response handling module 240 for receiving a response from the recipient and sending/forwarding the response to the sender of the voice message； and

·server data 250 storing data for the social networking platform, including but not limited to:

omessages database 114 storing text and voice messages sent by users in the social networking platform and, in some circumstances, converted text as the result of STT processing on the voice messages； and

oprofiles database 116 storing user profiles for users of client-side modules 102 and external service (s) 122 in the social networking platform, where a respective user profile for a user includes a user identifier (e.g. , an account name or handle) , login credentials to the social networking platform, (optionally) payment data (e.g. , linked credit card information, app credit or gift card balance, billing address, shipping address, etc. ) , STT preferences, an IP address or preferred contact information, contacts list, custom parameters for the user (e.g. , age, location, hobbies, etc. ) , and identified trends and/or likes/dislikes of the user.

Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e. , sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations. In some implementations, memory 206, optionally, stores a subset of the modules and data structures identified above. Furthermore, memory 206, optionally, stores additional modules and data structures not described above.

Figure 3 is a block diagram illustrating a representative client device 104 associated with a user in accordance with some embodiments. Client device 104, typically, includes one or more processing units (CPUs) 302, one or more network interfaces 304, memory 306, and one or more communication buses 308 for interconnecting these components (sometimes called a chipset) . Client device 104 also includes a user interface 310. User interface 310 includes one or more output devices 312 that enable presentation of media content, including one or more speakers and/or one or more visual displays. User interface 310 also includes one or more input devices 314, including user interface components that facilitate user input such as a keyboard, a mouse, a voice-command input unit or microphone, a touch screen display, a touch-sensitive input pad, a camera, a gesture capturing camera, or other input buttons or controls. Furthermore, some client devices 104 use a microphone and voice recognition or a camera and gesture recognition to supplement or replace the keyboard. Client device 104 further includes sensors 315, which provide information as to the environmental conditions associated with client device 104. Sensors 315 include but are not limited to one or more microphones, one or more cameras, an ambient light sensor, one or more accelerometers, one or more gyroscopes, a GPS positioning system, a Bluetooth or BLE system, a temperature sensor, one or more motion sensors, one or more biological sensors (e.g. , a galvanic skin resistance sensor, a pulse oximeter, and the like) , and other sensors. Memory 306 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices； and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. Memory 306, optionally, includes one or more storage devices remotely located from one or more processing units 302. Memory 306, or alternatively the non-volatile memory within memory 306, includes a non-transitory computer readable storage medium. In some implementations, memory 306, or the non-transitory computer readable storage medium of memory 306, stores the following programs, modules, and data structures, or a subset or superset thereof:

·operating system 316 including procedures for handling various basic system services and for performing hardware dependent tasks；

·network communication module 318 for connecting client device 104 to other computing devices (e.g. , server system 108 and external service (s) 122) connected to one or more networks 110 via one or more network interfaces 304 (wired or wireless) ；

·presentation module 320 for enabling presentation of information (e.g. , a user interface for a social networking platform, widget, websites or web pages thereof, game, and/or application, audio and/or video content, text, etc. ) at client device 104 via one or more output devices 312 (e.g. , displays, speakers, etc. ) associated with user interface 310；

·input processing module 322 for detecting one or more user inputs or interactions from one of the one or more input devices 314 and interpreting the detected input or interaction；

·web browser module 324 for navigating, requesting (e.g. , via HTTP) , and displaying websites and web pages thereof；

·one or more applications 326-1 –326-N for execution by client device 104 (e.g. , games, application marketplaces, payment platforms, and/or other applications) ； and

·client-side module 102, which provides client-side data processing and functionalities for the social networking platform, including but not limited to:

opayment processing 330 for processing payments associated with transactions initiated within the social networking platform or at a merchant’s website within web browser module 324； and

ocommunication system 332 for sending messages to and receiving messages from other users of the social networking platform (e.g. , instant messaging, group chat, message board, message/news feed, and the like) , including but not limited to:

■request handling module 334 for at least detecting a first instruction from a user of client device 104 to initiate a recording process and a second instruction from the user to terminate the recording process, and for forwarding the first and second instructions (or a notification thereof) to server system 108；

■recording module 336 for recording the voice message during the recording process；

■providing module 338 for providing the voice message or a portion thereof to server system 108 (e.g. , via a stream established by server system 108) ；

■receiving module 340 for receiving the result of STT processing on the voice message (e.g. , converted text) from server system 108； and

■presenting module 342 for presenting the result of STT processing on the voice message, which was received from server system 108, to the user of client device 104 in an interface of client-side module 102； and

·client data 350 storing data associated with the social networking platform, including but not limited to:

ouser profile 352 storing a user profile associated with the user of client device 104 including a user identifier (e.g. , an account name or handle) , login credentials to the social networking platform, payment data (e.g. , linked credit card information, app credit or gift card balance, billing address, shipping address, etc. ) , STT preferences, an IP address or preferred contact information, contacts list, custom parameters for the user (e.g. , age, location, hobbies, etc. ) , and identified trends and/or likes/dislikes of the user； and

ouser data 354 storing data authored, saved, liked, or chosen as favorites by the user of client device 104 in the social networking platform.

Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e. , sets of instructions) need not be implemented as separate software programs, procedures, modules or data structures, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations. In some implementations, memory 306, optionally, stores a subset of the modules and data structures identified above. Furthermore, memory 306, optionally, stores additional modules and data structures not described above.

In some embodiments, at least some of the functions of server system 108 are performed by client device 104, and the corresponding sub-modules of these functions may be located within client device 104 rather than server system 108. For example, in some embodiments, STT module 228, STT tuning module 234, and determining module 236 may be implemented at least in part on the client device 104. In some embodiments, at least some of the functions of client device 104 are performed by server system 108, and the corresponding sub-modules of these functions may be located within server system 108 rather than client device 104. Client device 104 and server system 108 shown in Figures 2-3, respectively, are merely illustrative, and different configurations of the modules for implementing the functions described herein are possible in various embodiments.

Figure 4 is a block diagram illustrating a representative external service 122 in accordance with some embodiments. External service 122, typically, includes one or more processing units (CPUs) 402, one or more network interfaces 404, memory 406, and one or more communication buses 408 for interconnecting these components (sometimes called a chipset) . Memory 406 includes high-speed random access memory, such as DRAM, SRAM, DDR RAM, or other random access solid state memory devices； and, optionally, includes non-volatile memory, such as one or more magnetic disk storage devices, one or more optical disk storage devices, one or more flash memory devices, or one or more other non-volatile solid state storage devices. Memory 406, optionally, includes one or more storage devices remotely located from one or more processing units 402. Memory 406, or alternatively the non-volatile memory within memory 406, includes a non-transitory computer readable storage medium. In some implementations, memory 406, or the non-transitory computer readable storage medium of memory 406, stores the following programs, modules, and data structures, or a subset or superset thereof:

·operating system 410 including procedures for handling various basic system services and for performing hardware dependent tasks；

·network communication module 412 for connecting external service 122 to other computing devices (e.g. , server system 108 and client devices 104) connected to one or more networks 110 via one or more network interfaces 404 (wired or wireless) ；

·social networking platform module 420 for interfacing with the social networking platform,including but not limited to:

oreceiving handling module 422 for receiving at least one of a voice message from a sender in the social networking platform and a result of speech-to-text (STT) processing performed by server system 108 on the voice message；

ospeech-to-text (STT) module 424 for performing STT processing on the obtained voice message to convert the voice message to text；

oautomatic reply module 426 for generating a response to the sender’s voice message or corresponding converted text based on the identity of the sender of the voice message and the content of the voice message； and

otransmitting module 428 for sending to server system 108 the response generated by automatic reply module 426 and, optionally, the result of STT module 424； and

·data 440 storing data associated with the social networking platform, including but not limited to:

ouser profile 442 storing a user profile associated with external service 122 including a user identifier (e.g. , an account name or handle) , login credentials to the social networking platform, STT preferences, contacts list, an IP address or preferred contact information, and the like； and

ouser data 444 storing data authored, saved, liked, or chosen as favorites by external service 122 in the social networking platform.

Each of the above identified elements may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above. The above identified modules or programs (i.e. , sets of instructions) need not be implemented as separate software programs, procedures, or modules, and thus various subsets of these modules may be combined or otherwise re-arranged in various implementations. In some implementations, memory 406, optionally, stores a subset of the modules and data structures identified above. Furthermore, memory 406, optionally, stores additional modules and data structures not described above.

Attention is now directed towards embodiments of user interfaces and associated processes that may be implemented on a client device 104 with one or more speakers 502, one or more microphones 504, and a touch screen 506 (sometimes also herein called a touch screen display) enabled to receive one or more contacts and display information (e.g. , media content, websites and web pages thereof, and/or user interfaces for an application such as a web browser or the social networking platform) . Figures 5A-5I illustrate exemplary user interfaces for generating a voice message in a social networking platform in accordance with some embodiments.

Although some of the examples that follow will be given with reference to inputs on touch screen 506 (where the touch sensitive surface and the display are combined) , in some embodiments, the device detects inputs on a touch-sensitive surface that is separate from the display. In some embodiments, the touch sensitive surface has a primary axis that corresponds to a primary axis on the display. In accordance with these embodiments, the device detects contacts with the touch-sensitive surface at locations that correspond to respective locations on the display. In this way, user inputs detected by the device on the touch-sensitive surface are used by the device to manipulate the user interface on the display of the device when the touch-sensitive surface is separate from the display. It should be understood that similar methods are, optionally, used for other user interfaces described herein.

Additionally, while the following examples are given primarily with reference to contacts (e.g. , finger inputs such as finger contacts, finger tap gestures, finger swipe gestures, etc. ) , it should be understood that, in some embodiments, one or more of the contacts are replaced with input from another input device (e.g. , a mouse-based, stylus-based, or physical button-based input) . For example, a swipe gesture is, optionally, replaced with a mouse click (e.g. , instead of a contact) followed by movement of the cursor along the path of the swipe (e.g. , instead of movement of the contact) . As another example, a tap gesture is, optionally, replaced with a mouse click while the cursor is located over the location of the tap gesture (e.g. , instead of detection of the contact followed by ceasing to detect the contact) or depression of a physical button. Similarly, when multiple user inputs are simultaneously detected, it should be understood that multiple computer mice are, optionally, used simultaneously, or a mouse and finger contacts are, optionally, used simultaneously.

Figures 5A-5I show interface 508 for an application associated with the social networking platform (e.g. , client-side module 102, Figures 1 and 3) displayed on client device 104 (e.g. , a mobile phone) ； however, one skilled in the art will appreciate that the user interfaces shown in Figures 5A-5I may be implemented on other similar computing devices. The user interfaces in Figures 5A-5I are used to illustrate the processes described herein, including the process described with respect to Figures 10A-10C.

For example, a user executes an application associated with the social networking platform (e.g. , client-side module 102, Figures 1 and 3) on client device 104. After executing the application associated with the social networking platform, client device 104 displays a home interface for the social networking platform, and, subsequently, client device 104 detects user selection of a messaging function of the social networking platform. Thereafter, continuing with this example, client device 104 detects user selection of recipient AA from the user’s contacts list or a list of popular users in the social networking platform. For example, recipient AA is associated with one of one or more external services 122, and recipient AA provides a car/taxi service, whereby a user is able to message recipient AA with the messaging function of the social networking platform to arrange for pickup by a car/taxi.

Figure 5A illustrates client device 104 displaying a messaging interface on touch screen 506 between the user of client device 104 and recipient AA (e.g. , a message thread) for the application associated with the social networking platform (e.g. , client-side module 102, Figures 1 and 3) . In Figure 5A, a first region 510 of the messaging interface includes: back affordance 514, which, when activated (e.g. , with a tap gesture) causes the application executed on client device 104 to display a previous interface (e.g. , a list of the user’s contacts) ； image/avatar 516 for recipient AA, which, when activated (e.g. , with a tap gesture) causes the application executed on client device 104 to display a profile or page within the social networking platform for recipient AA； “See conversation history” affordance 518, which, when activated (e.g. , with a tap gesture) causes the application executed on client device 104 to display previous messages between the user of client device 104 and recipient AA (if there are any) in first region 510； and message entry box 520 which enables the user of client device 104 to manually input a text-based message for recipient AA. In Figure 5A, a second region 512 of the messaging interface includes recording affordance 522 (e.g. , a press/tap to talk button) , which, when activated (e.g. , with a tap gesture, or a press-and-hold gesture) causes the application executed on client device 104 to initiate and maintain a recording process in order to record a voice message for recipient AA. Figure 5A also illustrates client device 104 detecting contact 524 at location 528-acorresponding to recording affordance 522.

In some embodiments, in response to detecting selection of recording affordance 522 in Figure 5A, the application associated with the social networking platform (e.g. , client-side module 1-2, Figures 1 and 3) , which is executed on client device 104, or a component thereof (e.g. , request handling module 334, Figure 3) sends a first instruction to server system 108 indicating that the user of client device 104 has initiated a recording process in order to send a voice message to recipient AA. In some embodiments, in response to receiving the first instruction, server system 108 or a component thereof (e.g. , obtaining/streaming module 226, Figure 2) establishes a connection with the application executed on client device 104 via network (s) 110 in order to stream, in real-time, the voice message to server system 108. Thereafter, in some embodiments, server system 108 or a component thereof (e.g. , STT module 228, Figure 2) converts the streamed voice message to text by performing STT processing on the streamed voice message and provides the converted text to the application executed on client device 104 for display to the user in the messaging interface of the application.

Figure 5B illustrates client device 104 displaying a speech-to-text region 526 for messaging interface in addition to first region 510 and second region 512. In Figure 5B, first region 510, second region 512, and speech-to-text region 526 are each distinct from one another. In Figure 5B, the user of client device 104 maintains contact 524 at location 528-a (e.g. , a press and hold gesture) , and speech-to-text region 526 displays a portion of the converted text received from server system 108 as the user continues to speak his/her voice message: “Hi, Recipient AA. Please …” In some embodiments, speech-to-text region 526 displays converted text, in real-time, as it is received from server system 108.

In Figure 5C, the user of client device 104 maintains contact 524 at location 528-a, and speech-to-text region 526 displays more of the converted text received from server system 108 as the user continues to speak his/her voice message: “Hi, Recipient AA. Please send an executive car to the corner …”

In Figure 5D, the user of client device 104 maintains contact 524 at location 528-a, and speech-to-text region 526 displays the complete converted text received from server system 108: “Hi, Recipient AA. Please send an executive car to the corner of Colombia and Telephone. ” Figure 5D illustrates client device 104 detecting a slide gesture (e.g. , a finger movement with no breaking of the contact with the display screen) whereby contact 524 moves from location 528-ato location 528-b corresponding to a sub-portion of the text in speech-to-text region 526 (e.g. , the word “Colombia” ) .

In some embodiments, in response to detecting selection of a sub-portion of the text in speech-to-text region 526, the application associated with the social networking platform (e.g. , client-side module 1-2, Figures 1 and 3) , which is executed on client device 104, sends instruction (s) to server system 108 indicating that the user of client device wishes to see a replacement or alternate word (s) for the selected sub-portion of the text. For example, in response to receiving the instruction (s) , server system 108 sends one or more alternate word (s) to client device 104 for display in speech-to-text region 526 of the messaging interface at a predefined frequency. As such, server system 108 sends alternate word (s) one at a time every X seconds for replacement of the selected sub-portion of the text until the one or more alternate word (s) are exhausted or the user of client device 104 is either satisfied with the alternate word (s) or takes another action (e.g. , aborting the recording process) .

In Figure 5E, after one or more alternate words for “Colombia” have been presented to the user of client device 104, the word “Colombia” is replaced with “Columbia” in speech-to-text region 526. The server receives a signal from the client device indicating that the user is satisfied with the current replacement word for that selected sub-portion of text. The signal can be sent to the server by the client device when the user moves the contact 524 away from the selected portion of the text at location 528-b. Figure 5E illustrates client device 104 detecting a slide gesture whereby contact 524 moves from location 528-b to location 528-c corresponding to another sub-portion of the text in speech-to-text region 526 (e.g. , the word “Telephone” ) . For example, the user of client device 104 is satisfied with “Columbia” as a replacement for “Colombia, ” and, subsequently, the user slides his/her finger over the word “Telephone” in order to view suitable replacement for “Telephone. ”

In Figure 5F, after one or more alternate words for “Telephone” have been presented to the user of client device 104, the word “Telephone” is replaced with “Telegraph” in speech-to-text region 526. Figure 5F illustrates client device 104 detecting a slide gesture whereby contact 524 moves from location 528-c to location 528-d corresponding to second region 512 (but not onto recording affordance 522) . For example, the user of client device 104 is satisfied with “Telegraph” as a replacement for “Telephone, ” and, subsequently, the user slides his/her finger out of speech-to-text-region 526 in order to manually edit the text in speech-to-text-region 526.

For example, in response to detecting a slide gesture (e.g. , with contact 524) from a location within speech-to-text region 526 (e.g. , in Figure 5E or Figure 5F) to a location in second region 512 (but not onto recording affordance 522) or a location in first region 5120, the recording process is paused and the application associated with the social networking platform (e.g. , client-side module 1-2, Figures 1 and 3) , which is executed on client device 104, displays a virtual keyboard (e.g. , Figure 5H) whereby the user of client device 104 is able to manually edit the text in speech-to-text region 526. In another example, in response to detecting a slide gesture (e.g. , with contact 524) from a location corresponding to recording affordance 522 (e.g. , in Figure 5B, Figure 5C, or Figure 5D) to a location in second region 512 (but not onto recording affordance 522) or a location in first region 510, the recording process is paused and the application displays a virtual keyboard (e.g. , Figure 5H) whereby the user of client device 104 is able to manually edit the text in speech-to-text region 526. In another example, in response to detecting lift-off of contact 524 in Figure 5D from location 528-a, Figure 5E from location 528-b, or Figure 5F from location 528-c, the application sends a second instruction to server system 108 indicating termination of the recording process and, also, that the user wishes to send recipient AA the voice message and/or converted text corresponding to the voice message (e.g. , the text displayed in speech-to-text-region 526 in Figure 5D, Figure 5E, or Figure 5F, respectively) .

Figure 5G illustrates client device 104 displaying virtual keyboard 530 in third region 531 of the messaging interface in response to detecting the slide gesture in Figure 5F. In Figure 5G, first region 510, second region 512, speech-to-text region 526, and third region 531 are each distinct from one another. In Figure 5G, for example, the recording process is paused and the user of client device 104 is able to manually edit the text in speech-to-text region 526 by selecting an insertion point in the text displayed in speech-to-text-region 526 and selecting keys of virtual keyboard 530 (keys not shown) to enter or delete characters. For example, virtual keyboard 530 includes virtual/soft keys corresponding to the keys of a common QWERTY keyboard. In Figure 5G, third region 531 also includes send affordance 532, which, when activated (e.g. , with a tap gesture) causes the application executed on client device 104 to send a second instruction to server system 108 indicating termination of the recording process and, also, that the user wishes to send recipient AA the voice message and/or converted text corresponding to the voice message (e.g. , the text displayed in speech-to-text-region 526 in Figure 5G) .

Figure 5H illustrates client device 104 displaying edited text in speech-to-text region 526 after the user of client device 104 edits the text in speech-to-text region 526 with virtual keyboard 530 in Figure 5G. In Figure 5H, the text in speech-to-text region 526 has been edited to read, “Hi, Recipient AA. Please send a taxi to the corner of Columbia and Telegraph, ” whereas, in Figure 5G, the text in speech-to-text region 526 read, “Hi, Recipient AA. Please send an executive car to the corner of Columbia and Telegraph. ” Figure 5H also illustrates client device 104 detecting contact 534 at a location corresponding to recording affordance 522. For example, after selecting recording affordance 522, the application associated with the social networking platform (e.g. , client-side module 102, Figures 1 and 3) , which is executed on client device 104, un-pauses the recording process in order for the user of client device 104 to record an addition to the voice message corresponding to the text in speech-to-text region 526.

Figure 5I illustrates client device 104 displaying text in speech-to-text region 526 that reads, “Hi, Recipient AA. Please send an executive car to the corner of Columbia and Telegraph. We need it in 10 minutes, ” after the user of client device 104 records an addition to the voice message in Figure 5H. In Figure 5I also illustrates client device 104 is no longer detecting contact 534. Thus, for example, the application associated with the social networking platform (e.g. , client-side module 102, Figures 1 and 3) , which is executed on client device 104, sends a second instruction to server system 108 indicating termination of the recording process and, also, that the user wishes to send recipient AA the voice message and/or converted text corresponding to the voice message (e.g. , the text displayed in speech-to-text-region 526 in Figure 5I) .

Figure 6 illustrates a flowchart diagram of a method 600 of replying to social application information in accordance with some embodiments. In some embodiments, method 600 is performed by a server with one or more processors and memory within a server-client environment 100 for a social networking platform. In some embodiments, the server manages and operations the social networking platform. In some embodiments, in addition to the server, server-client environment 100 includes one or more client devices 104 (Figures 1 and 3) (sometimes also herein called a “terminal” or a “mobile terminal” ) each associated with a user account in the social networking platform, and one or more external services 122 (Figures 1 and 4) (sometimes also herein called a “background server” ) each associated with a public account in the social networking platform. For example, in some embodiments, method 600 is performed by server system 108 (Figures 1-2) or a component thereof (e.g. , server-side module 106, Figures 1-2) . In some embodiments, method 600 is governed by instructions that are stored in a non-transitory computer readable storage medium and the instructions are executed by one or more processors of the server system.

The server receives (602) an identifier of a user account, an identifier of a public account (e.g. , the recipient of the social application information) , and social application information sent by a user in the social networking platform from a terminal in server-client environment 100. In some embodiments, the public account is a social account that can broadcast social application information to one or more users in the social networking platform that subscribe to the social account. For example, the social application information includes a text-based or voice-based message.

The server sends (604) the identifier of the user account and the social application information to a background server corresponding to the public account in the social networking platform. For example, the background server is a social application server that is managed and operated by a developer of a service provided to the one or more users in the social networking platform. In some embodiments, the background server automatically replies to the social application information sent by the server by generating reply information based on the user account and the content of the social application information.

The server receives (606) the reply information from the background server along with the identifier of the user account and the identifier of the public account.

The server sends (604) , through the public account corresponding to the identifier of the public account, the reply information to the terminal corresponding to the identifier of the user account. For example, the server sends the reply information to the user associated with the user account as a message or social media post from the background server associated with the public account.

Thus, the background server (e.g. , associated with the developer) can automatically reply to the social application information (e.g. , social media posts or messages) , thereby improving the efficiency in replying to social application information.

It should be understood that the particular order in which the operations in Figure 6 have been described is merely exemplary and is not intended to indicate that the described order is the only order in which the operations could be performed. One of ordinary skill in the art would recognize various ways to reorder the operations described herein. Additionally, it should be noted that details of other processes described herein with respect to other methods described herein (e.g. ,

methods

700, 800, 900, and 1000) are also applicable in an analogous manner to method 600 described above with respect to Figure 6.

Figure 7 illustrates a flowchart diagram of a method 700 of replying to social application information in accordance with some embodiments. In some embodiments, method 700 is performed in a server-client environment 100 for a social networking platform. In some embodiments, server-client environment 100 includes a server system 108 (Figures 1-2) (sometimes also herein called a “server” ) , one or more client devices 104 (Figures 1 and 3) (sometimes also herein called a “terminal” or a “mobile terminal” ) each associated with a user account in the social networking platform, and one or more external services 122 (Figures 1 and 4) (sometimes also herein called a “background server” ) each associated with a public account in the social networking platform. In some embodiments, the server manages and operations the social networking platform.

The terminal sends (702) an identifier of a user account, an identifier of a public account (e.g. , the recipient of the social application information) , and social application information to a server, where the social application information is text information. In some embodiments, the server manages a social networking platform implemented in server-client environment 100. In some embodiments, the user of the terminal (e.g. , client device 104, Figures 1 and 3) registers their user account for the social networking platform with the server in advance, and a developer associated with a background server (e.g. , an external service 122, Figures 1 and 4) also registers a public account for the social networking platform with the server in advance. For example, the developer provides a service provided to the one or more users in the social networking platform.

For example, the user subscribes to user accounts associated with other users in the social networking platform and/or the user subscribes to public accounts associated with developers in the social networking platform. In this example, the user receives messages and social media posts from users and/or public accounts to which the user is subscribed, and, also, the user is able to send messages and social media posts to users and/or public accounts to which the user is subscribed. In this example, a developer associated with a public account may broadcast social application information to one or more user accounts that are subscribed to the public account. For example, when a user needs to send social application information to a certain developer, the user inputs the social application information into a terminal associated with the user, where the social application information is text information. Then, the terminal sends an identifier of the user account in the social networking platform that is associated with the user, an identifier of the public account associated with the developer, and the social application information to the server.

In some embodiments, when the user subscribes to a user account associated with another user in the social networking platform, the other user must allow the user to subscribe to him/her； however, the user is automatically allowed to subscribe to a public account without authorization on behalf of the background server associated with the public account. In some embodiments, the social networking platform is an IM (Instant Messenger) application, an SNS (Social Networking Services) application, or the like, or a combination thereof.

The server receives (704) the identifier of the user account, the identifier of the public account, and the text information that are sent by the terminal.

The server sends (706) the identifier of the user account and the text information to a background server corresponding to the identifier of the public account. In some embodiments, the server obtains, according to the identifier of the public account, an address of the background server associated with the developer from a user profile associated with the public account that is stored in profiles database 116 (Figure 1) . For example, profiles database 116 stores profiles for users with user accounts registered in the social networking platform and profiles for background servers with public accounts registered in the social networking platform. In this example, a profile for a user includes an IP address, email address, user preferences, and/or the like for the user. In this example, a profile for a background server includes an IP address, email address, and/or the like for the background server. For example, when registering the public account for the social networking platform with the server, the developer specifies their IP address, email address, and/or other contact information

The background server receives (708) the identifier of the user account and the text information, and performs automatic reply processing. In some embodiments, the background server includes an automatic reply processing program whereby the automatic reply processing program automatically generates reply information based on the identity of the user account and the text information. In some embodiments, the automatic reply processing program is developed by the developer according to the demand of the developer for implementing services within the social networking platform.

The background server sends (710) the reply information to the server along with the identifier of the user account and the identifier of the public account.

The server receives (712) the identifier of the user account, the identifier of the public account, and the reply information.

The server sends (714) , through a public account corresponding to the identifier of the public account, the reply information to the terminal corresponding to the identifier of the user account.

In this way, the background server performs, by using the automatic reply processing program, automatic reply processing on the text information sent by the user, thereby improving the efficiency in replying to social application information.

It should be understood that the particular order in which the operations in Figure 7 have been described is merely exemplary and is not intended to indicate that the described order is the only order in which the operations could be performed. One of ordinary skill in the art would recognize various ways to reorder the operations described herein. Additionally, it should be noted that details of other processes described herein with respect to other methods described herein (e.g. ,

methods

600, 800, 900, and 1000) are also applicable in an analogous manner to method 700 described above with respect to Figure 7.

Figure 8 illustrates a flowchart diagram of a method 800 of replying to social application information in accordance with some embodiments. In some embodiments, method 800 is performed in a server-client environment 100 for a social networking platform. In some embodiments, server-client environment 100 includes a server system 108 (Figures 1-2) (sometimes also herein called a “server” ) , one or more client devices 104 (Figures 1 and 3) (sometimes also herein called a “terminal” or a “mobile terminal” ) each associated with a user account in the social networking platform, and one or more external services 122 (Figures 1 and 4) (sometimes also herein called a “background server” ) each associated with a public account in the social networking platform. In some embodiments, the server manages and operations the social networking platform.

The terminal receives (802) voice data from a user of the terminal, and encapsulates the received voice data into a voice data packet. For example, the user inputs a start command into the terminal, which enables one or more microphones of the terminal. Figure 5A, for example, shows client device 104 detecting contact 524 (e.g. , a press and hold gesture) at a location corresponding to recording affordance 522 which causes the application associated with the social networking platform, which is executed on client device 104, to initiate a recording process for recording a voice message to a recipient in the social networking platform (e.g. , the background server corresponding to a public account) . Then, continuing with this example, the user provides voice data to the terminal. In this example, when the user needs to stop the terminal from continuing to receive the voice data, the user inputs a stop command into the terminal. For example, after lift-off of contact 524 in Figure 5D, the application associated with the social networking platform terminates the recording process and causes the voice message to be sent to the recipient.

The terminal sends (804) an identifier of the user account associated with the user of the terminal, an identifier of a public account (e.g. , the recipient of the voice data packet) , and the voice data packet to a server. In some embodiments, the server manages a social networking platform implemented in server-client environment 100. In some embodiments, the user of the terminal (e.g. , client device 104, Figures 1 and 3) registers their user account for the social networking platform with the server in advance, and a developer associated with a background server (e.g. , an external service 122, Figures 1 and 4) also registers a public account for the social networking platform with the server in advance. For example, the developer provides a service provided to the one or more users in the social networking platform.

The server receives (806) the identifier of the user account, the identifier of the public account, and the voice data packet that are sent by the terminal.

The server performs (808) voice recognition (e.g. , speech-to-text (STT) processing) on the received voice data packet to convert the received voice data packet into text information. In some embodiments, in response to receiving the voice data packet, the server invokes a voice recognition algorithm to perform voice recognition on the received voice data packet in order to convert the received voice data packet into the text information.

The server sends (810) the identifier of the user account and the converted text information to a background server corresponding to the identifier of the public account. In some embodiments, the server obtains, according to the identifier of the public account, an address of the background server associated with the developer from a user profile associated with the public account that is stored in profiles database 116 (Figure 1) . For example, profiles database 116 stores profiles for users with user accounts registered in the social networking platform and profiles for background servers with public accounts registered in the social networking platform. In this example, a profile for a user includes an IP address, email address, user preferences, and/or the like for the user. In this example, a profile for a background server includes an IP address, email address, and/or the like for the background server. For example, when registering the public account for the social networking platform with the server, the developer specifies their IP address, email address, and/or other contact information.

The background server receives (812) the identifier of the user account and the converted text information, and performs automatic reply processing. In some embodiments, the background server includes an automatic reply processing program whereby the automatic reply processing program automatically generates reply information based on the identity of the user account and the converted text information. In some embodiments, the automatic reply processing program is developed by the developer according to the demand of the developer for implementing services within the social networking platform. In some embodiments, the background server receives the voice data packet from the server (in addition to or instead of the converted text information) , and performs voice recognition (e.g. , speech-to-text (STT) processing) on the received voice data packet in order to obtain converted text information for the voice data packet.

The background server sends (814) the reply information to the server along with the identifier of the user account and the identifier of the public account.

The server receives (816) the identifier of the user account, the identifier of the public account, and the reply information.

The server sends (818) , through a public account corresponding to the identifier of the public account, the reply information to the terminal corresponding to the identifier of the user account.

In this way, the background server performs, by using the automatic reply processing program, automatic reply processing on the converted text information, thereby improving the efficiency in replying to social application information.

It should be understood that the particular order in which the operations in Figure 8 have been described is merely exemplary and is not intended to indicate that the described order is the only order in which the operations could be performed. One of ordinary skill in the art would recognize various ways to reorder the operations described herein. Additionally, it should be noted that details of other processes described herein with respect to other methods described herein (e.g. ,

methods

600, 700, 900, and 1000) are also applicable in an analogous manner to method 800 described above with respect to Figure 8.

Figure 9 illustrates a flowchart diagram of a method 900 of replying to social application information in accordance with some embodiments. In some embodiments, method 900 is performed in a server-client environment 100 for a social networking platform. In some embodiments, server-client environment 100 includes a server system 108 (Figures 1-2) (sometimes also herein called a “server” ) , one or more client devices 104 (Figures 1 and 3) (sometimes also herein called a “terminal” or a “mobile terminal” ) each associated with a user account in the social networking platform, and one or more external services 122 (Figures 1 and 4) (sometimes also herein called a “background server” ) each associated with a public account in the social networking platform. In some embodiments, the server manages and operations the social networking platform.

The terminal periodically receives (902) voice data from a user of the terminal, and encapsulates the received voice data into a voice data packet with a sequence number. For example, the user inputs a start command into the terminal, which enables one or more microphones of the terminal. Then, continuing with this example, the user provides voice data to the terminal. In this example, when the user needs to stop the terminal from continuing to receive the voice data, the user inputs a stop command into the terminal.

Specifically, when receiving the start command, the terminal starts to receive the voice data, and the terminal encapsulates received voice data from a predefined timed period (e.g. , 5, 10, 15, etc. seconds) into a voice data packet with a sequence number. For example, the sequence number is a unique, monotonically ascending number. The terminal sends an identifier of a user account of the user of the terminal, an identifier of a public account (e.g. , the recipient of the voice data packet) , and the voice data packet with the sequence number to the server associated for the social networking platform. When receiving the stop command, the terminal encapsulates the voice data received for the current period into a voice data packet with a sequence number and a terminator.

The terminal sends (904) the identifier of the user account, the identifier of the public account (e.g. , the recipient of the voice data packet) , and the voice data packet to the server. In some embodiments, the server manages a social networking platform implemented in server-client environment 100. In some embodiments, the user of the terminal (e.g. , client device 104, Figures 1 and 3) registers their user account for the social networking platform with the server in advance, and a developer associated with a background server (e.g. , an external service 122, Figures 1 and 4) also registers a public account for the social networking platform with the server in advance. For example, the developer provides a service provided to the one or more users in the social networking platform.

The server receives (906) the identifier of the user account, the identifier of the public account, and the voice data packet that are sent by the terminal, and the server performs voice recognition (e.g. , speech-to-text (STT) processing) on the received voice data packet to convert the received voice data packet into text information. In some embodiments, in response to receiving the voice data packet, the server invokes a voice recognition algorithm to perform voice recognition on the received voice data packet in order to convert the received voice data packet into the text information.

The server determines (908) whether the received voice data includes a voice data packet with a sequence number and a terminator or a voice data packet with only a sequence number. In accordance with a determination that the received voice data packet includes a sequence number and a terminator, method 900 continues to operation 910. For example, if the received voice data packet includes a sequence number and a terminator, it indicates that the user has terminated the recording process. In accordance with a determination that the received voice data packet includes only a sequence number, method 900 returns to operation 904. For example, if the received voice data packet includes only a sequence number, it indicates that the user has not completed the recording process and is still inputting voice data.

The server combines (910) the converted text information corresponding to the received voice data packets into combined converted text information according to the sequence numbers of the received voice data packets.

The server sends (912) the identifier of the user account and the combined converted text information to a background server corresponding to the identifier of the public account. In some embodiments, the server obtains, according to the identifier of the public account, an address of the background server associated with the developer from a user profile associated with the public account that is stored in profiles database 116 (Figure 1) . For example, profiles database 116 stores profiles for users with user accounts registered in the social networking platform and profiles for background servers with public accounts registered in the social networking platform. In this example, a profile for a user includes an IP address, email address, user preferences, and/or the like for the user. In this example, a profile for a background server includes an IP address, email address, and/or the like for the background server. For example, when registering the public account for the social networking platform with the server, the developer specifies their IP address, email address, and/or other contact information.

The background server receives (914) the identifier of the user account and the combined converted text information, and performs automatic reply processing. In some embodiments, the background server includes an automatic reply processing program whereby the automatic reply processing program automatically generates reply information based on the identity of the user account and the combined converted text information. In some embodiments, the automatic reply processing program is developed by the developer according to the demand of the developer for implementing services within the social networking platform. In some embodiments, the background server receives the combined voice data packet from the server (in addition to or instead of the combined converted text information) , and performs voice recognition (e.g. , speech-to- text (STT) processing) on the combined voice data packet in order to obtain converted text information for the combined voice data packet.

The background server sends (916) the reply information to the server along with the identifier of the user account and the identifier of the public account.

The server receives (918) the identifier of the user account, the identifier of the public account, and the reply information.

The server sends (920) , through a public account corresponding to the identifier of the public account, the reply information to the terminal corresponding to the identifier of the user account.

In this way, the background server performs, by using the automatic reply processing program, automatic reply processing on the combined converted text information, thereby improving the efficiency in replying to social application information. In addition, the terminal divides voice data from the user into a plurality of voice data packets and sends the plurality of voice data packets to the server. Then, the server receives the voice data packets periodically sent by the terminal and converts the voice data packets into text information, thereby reducing the time for the server to convert the voice data packets, and further improving the efficiency in replying to social application information.

In some embodiments, as an alternative to operations 902-912 of method 900, the server performs operations 952-962 (as shown in Figure 9B) .

The server receives (952) a first instruction from the terminal to initiate a recording process and receives voice data periodically. In some embodiments, the first instruction includes an identity of the user account of the user who provided the start command and an identity of a public account for the recipient of the voice data. For example, the terminal detects a start command input by a user of the terminal and sends a first instruction to the server indicating detection of the start command. Continuing with this example, in response to receiving the first instruction, the server initiates a recording process whereby voice data is streamed from the terminal to the server via a direct connection established by the server

After each predefined timed period (e.g. , 5, 10, 15, etc. seconds) , the server encapsulates (954) the received voice data received in the predefined timed period into a voice data packet with a sequence number.

The server receives (956) a second instruction from the terminal to terminate the recording process. For example, the terminal detects a stop command input by the user of the terminal and sends a second instruction to the server indicating detection of the stop command.

In response to the second instruction, the server encapsulates (958) the voice data received in a current period into a voice data packet with a sequence number and a terminator.

The server performs (960) voice recognition on the voice data packets to convert the voice data packets into combined text information according to the sequence numbers.

The server sends (962) the identifier of the user account and the combined converted text information to a background server corresponding to the identifier of the public account.

It should be understood that the particular order in which the operations in Figures 9A-9B have been described is merely exemplary and is not intended to indicate that the described order is the only order in which the operations could be performed. One of ordinary skill in the art would recognize various ways to reorder the operations described herein. Additionally, it should be noted that details of other processes described herein with respect to other methods described herein (e.g. ,

methods

600, 700, 800, and 1000) are also applicable in an analogous manner to method 900 described above with respect to Figures 9A-9B.

Figures 10A-10C illustrate a flowchart diagram of a method 1000 of disseminating messages in a social networking platform in accordance with some embodiments. In some embodiments, method 1000 is performed by a server with one or more processors and memory. For example, in some embodiments, method 1000 is performed by server system 108 (Figures 1-2) or a component thereof (e.g. , server-side module 106, Figures 1-2) . In some embodiments, method 1000 is governed by instructions that are stored in a non-transitory computer readable storage medium and the instructions are executed by one or more processors of the server system. Optional operations are indicated by dashed lines (e.g. , boxes with dashed-line borders) .

The server receives (1002) a first instruction from a user in the social networking platform to initiate a recording process for sending a voice message to a recipient in the social networking platform. In some embodiments, the first instruction indicates detection of a user input at client device 104 to initiate a recording process such as clicking on a “record” button or performing a predetermined gesture on the user interface. For example, in response to detecting selection of recording affordance 522 in Figure 5A, the application associated with the social networking platform (e.g. , client-side module 1-2, Figures 1 and 3) , which is executed on client device 104, or a component thereof (e.g. , request handling module 334, Figure 3) sends a first instruction to server system 108 indicating that the user of client device 104 has initiated a recording process so as to send a voice message to recipient AA.

In some embodiments, the recipient is a machine (or programs executing on the machine) rather than a real user. The machine, unlike a real person, may not have the ability to process and interpret voice messages. For example, recipient AA is associated with one of one or more external services 122, and recipient AA provides car/taxi services, whereby a user is able to message recipient AA with the messaging function of the social networking platform to arrange for car/taxi service pickup.

Alternatively, in some embodiments, the recipient is a real-person associated with a user account in the social networking platform that has a setting turned on for the social networking platform to only accept text, rather than voice. For example, the user activates this feature when in a crowded environment, or when the user does not have ear buds or headphones plugged into his device. In another example, the application associated with the social networking system automatically activates this feature when environment data from one or more sensors 315 (Figures 3) of client device 104 (e.g. , ambient light data, ambient sound data, ambient temperature data, GPS location information, etc. ) associated with the user indicates that client device 104 (and thereby the user) is located in a crowded environment (e.g. , a restaurant, concert, or movie theater) , an environment where the user does not have access to the one or more speakers of client device 104, or does not have ear buds or headphones connected to client device 104.

In some embodiments, the recipient is associated with (1004) a public account in the social networking platform. In some embodiments, users associated with public accounts are in a different class of users that normal user of the social networking platform. For example, users with public accounts are registered to directly receive voice messages and/or converted text corresponding to the voice messages for auto replying to normal users in the social networking system. In some embodiments, a user is able to subscribe (i.e. , track, follow, friend, etc. ) to a user in the social networking platform with a public account without receiving authorization from the user with the public account； however, the a user is able to subscribe to a user in the social networking platform with a normal user account after receiving authorization from the user with the normal user account. In some embodiments, when a first user subscribes to a second user in the social networking platform, the second user is added to a contact list for the first user, and the first user receives messages from and is able to send messages to the second user.

In some embodiments, in response to receiving the first instruction from the user in the social networking platform to initiate the recording process, the server establishes (1006) a data connection with the client device of the user so as to stream the voice message. As such, at least the first portion of the voice message is obtained from the user via the established data connection. For example, in response to receiving the first instruction indicating that the user of client device 104 has initiated a recording process so as to send a voice message to recipient AA by selecting recording affordance 522 in Figure 5A, server system 108 or a component thereof (e.g. , obtaining/streaming module 226, Figure 2) establishes a connection with the application associated with the social networking platform (e.g. , client-side module 1-2, Figures 1 and 3) , which is executed on client device 104, via network (s) 110 so as to stream, in real-time, the voice message to server system 108.

During the recording process, the server obtains (1008) at least a first portion of the voice message from the user. In some embodiments, the voice message is streamed to server system 108 (as discussed above in relation to operation 1006) from an application on the user’s client device that is associated with the social networking platform. For example, in response to receiving the first instruction, server system 108 initiates the recording process whereby voice data corresponding to the voice message being captured at client device 104 is obtained by server system 108 (e.g. , via the established stream) .

During the recording process, the server performs (1010) speech-to-text (STT) processing to convert the first portion of the voice message to a first text portion. In some embodiments, server system 108 or a component thereof (e.g. , STT module 228, Figure 2) converts the streamed voice message to text by performing STT processing (or natural language processing) on the streamed voice message and provides the converted text to the application executed on client device 104 for display to the user in the messaging interface of the application.

In some embodiments, the server performs (1012) the STT processing in real-time on at least a first portion of the voice message. In some embodiments, STT processing is performed in real-time on the voice message as the voice message is streamed to server system 108.

During the recording process, the server provides (1014) the first text portion to the user for display at a client device corresponding to the user. In some embodiments, server system 108 or a component thereof (e.g. , providing module 230, Figure 2) provides the STT processing result to client device 104. As such, the application associated with the social networking platform (e.g. , client-side module 102, Figures 1 and 3) or a component thereof (e.g. , presenting module 342, Figure 3) displays the converted text in real-time (or near real-time) as the user speaks. In some embodiments, the first text portion is displayed in a message preparation region of the interface for the application, as real-time feedback to the user’s voice input. For example, in Figures 5B-5D, client device 104 displays converted text received from server system 108 in speech-to-text region 526 (i.e. , the message preparation region) of the messaging interface for the application associated with the social networking platform as the user speaks out his/her voice message. In Figures 5A-5D, for example, the messaging interface for the application associated with the social networking platform shows a message thread between the user of client device 104 and recipient AA.

In some embodiments, during the recording process, the server (1016) : after providing the first text portion to the user, obtains a third instruction from the user to provide an alternate speech-to-text conversion for a selected sub-portion in the provided first text portion； and provides the alternate speech-to-text conversion corresponding to the selected sub-portion for display at the client device corresponding to the user. In some embodiments, a user of the application for the social networking platform is able to slide his/her finger off the “record” button and hover over a word in the message preparation area, which the server’s STT incorrectly converted to text. In Figure 5D, for example, speech-to-text region 526 (i.e. , the message preparation region) displays the complete converted text received from server system 108 for the voice message: “Hi, Recipient AA. Please send an executive car to the corner of Colombia and Telephone. ” Figure 5D shows, for example, client device 104 detecting a slide gesture whereby contact 524 moves from location 528-ato location 528-b corresponding to a sub-portion of the text in speech-to-text region 526 (e.g. , the word “Colombia” ) . For example, in response to detecting selection of a sub-portion of the text in speech-to-text region 526 (e.g. , the word “Colombia” ) , the application associated with the social networking platform (e.g. , client-side module 1-2, Figures 1 and 3) , which is executed on client device 104, sends a third instruction to server system 108 indicating that the user of client device wishes to see a replacement or alternate word (s) for the selected sub-portion of the text. For example, server system 108 sends one or more alternate word (s) to client device 104 for display in speech-to-text region 526 of the messaging interface at a predefined frequency. As such, server system 108 sends alternate word (s) one at a time every X seconds for replacement of the selected sub-portion of the text until the one or more alternate word (s) are exhausted or the user of client device 104 is either satisfied with the alternate word (s) or takes another action. For example, in Figure 5E, after one or more alternate words for “Colombia” have been presented to the user of client device 104, the word “Colombia” is replaced with “Columbia” in speech-to-text region 526. For example, the user of client device 104 is satisfied with “Columbia” as a replacement for “Colombia. ”

In some embodiments, during the recording process, the server (1018) : after providing the first text portion to the user and prior to obtaining the second instruction from the user to terminate the recording process, obtains at least a second portion of the voice message from the user distinct from the at least one first portion of the voice message； performs STT processing to convert the second portion of the voice message to a second text portion； and provides the second text portion to the user for concurrent display at the client device corresponding to the user with the first text portion. In some embodiments, after viewing the first STT portion, the user of client device 104 decides to append additional information to the voice message. In Figure 5H, for example, client device 104 detects contact 534 at a location corresponding to recording affordance 522. For example, after selecting recording affordance 522 in Figure 5H, the application associated with the social networking platform (e.g. , client-side module 102, Figures 1 and 3) , which is executed on client device 104, un-pauses the recording process in order for the user of client device 104 to record an addition to the voice message corresponding to the text in speech-to-text region 526. In Figure 5I, for example, client device 104 displays text in speech-to-text region 526 that reads, “Hi, Recipient AA. Please send an executive car to the corner of Columbia and Telegraph. We need it in 10 minutes, ” after the user of client device 104 records an addition to the voice message in Figure 5H.

During the recording process, after providing the first text portion to the user, the server receives (1020) a second instruction from the user to terminate the recording process. For example, the user terminates the recording process by lifting his/her finger off a “record” button in the interface for the social networking application. For example, in response to detecting lift-off of contact 524 in Figure 5D from location 528-a, Figure 5E from location 528-b, or Figure 5F from location 528-c, the application associated with the social networking application (e.g. , client-side module 102, Figures 1 and 3) , which is executed on client device 104, sends a second instruction to server system 108 indicating termination of the recording process and, also, that the user wishes to send recipient AA the voice message and/or converted text corresponding to the voice message (e.g. , the text displayed in speech-to-text-region 526 in Figure 5D, Figure 5E, or Figure 5F, respectively) . In another example, Figure 5I shows that client device 104 is no longer detecting contact 534 (e.g. , as shown in Figure 5H) . Thus, in this example, the application associated with the social networking platform sends a second instruction to server system 108 indicating termination of the recording process and, also, that the user wishes to send recipient AA the voice message and/or converted text corresponding to the voice message (e.g. , the text displayed in speech-to-text-region 526 in Figure 5I) . In another example, third region 531 in Figure 5G includes send affordance 532, which, when activated (e.g. , with a tap gesture) causes the application executed on client device 104 to send a second instruction to server system 108 indicating termination of the recording process and, also, that the user wishes to send recipient AA the voice message and/or converted text corresponding to the voice message (e.g. , the text displayed in speech-to-text-region 526 in Figure 5G) .

During the recording process, in response to receiving the second instruction from the user to terminate the recording process, the server determines (1022) a recipient preference for at least one of the voice message and a result of the speech-to-text processing. In some embodiments, the recipient is associated with a user profile which indicates that the user is a associated with a public account and whether the recipient prefers the raw voice message, the STT result, or both. In some embodiments, server system 108 or a component thereof (e.g. , determining module 236, Figure 2) determines whether a STT preference (sometimes also herein called a “recipient preference” ) in a user profile for the recipient (e.g. , stored in profiles database 116) identifies a preference for the voice message and/or the result of the speech-to-text processing on the voice message.

In some embodiments, during the recording process, in response to receiving the second instruction from the user to terminate the recording process and in accordance with a determination that the recipient is associated with a public account, the server determines a recipient preference for at least one of the voice message and a result of the STT processing. In some embodiments, server system 108 or a component thereof (e.g. , determining module 236, Figure 2) determines whether a STT preference (sometimes also herein called a “recipient preference” ) in a user profile for the recipient (e.g. , stored in profiles database 116) identifies a preference for the voice message and/or the result of the speech-to-text processing on the voice message and whether a flag in the user profile for the recipient indicates that the recipient is associated with a public account in the social networking system. In some embodiments, users with public accounts are registered to directly receive the voice messages and/or converted text corresponding to the voice messages for auto replying to normal users in the social networking system. For example, users (i.e. , developers) in the social networking system register a public account with server system 108 in order to utilize this additional feature.

In accordance with a determination that the recipient preference is for the result of the speech-to-text processing, the server provides (1024) at least the first text portion of the voice message to the recipient. In accordance with a determination that the recipient prefers to receive voice message, the server sends the voice message. In accordance with a determination that the recipient prefers to receive both, the server sends both the voice message and the result of the speech-to-text processing on the voice message. In some embodiments, server system 108 or a component thereof (e.g. , sending module 238, Figure 2) sends/provides at least one of the voice message and a result of the speech-to-text processing to the recipient based on the recipient preference identified by determining module 236.

In some embodiments, after providing at least the first text portion corresponding to the voice message to the recipient, the server obtains (1026) a response from the recipient and provides the response from the recipient to the user as a message via the social networking platform. In some embodiments, after providing at least the first text portion corresponding to the voice message and/or the voice message to the recipient and/or the voice message to the recipient, server system 108 or a component thereof (e.g. , response handling module 240, Figure 2) obtains a response to the first text portion corresponding to the voice message and/or the voice message to be sent to the user In some embodiments, the response is an automatic reply generated by the recipient based on the user’s identity (i.e. , the sender’s identity) and the content of the voice message and/or text portion corresponding to the voice message. For example, the response is a post or message sent through the social networking platform. In another example, the response is sent via a communication method indicated by preferred contact information in a user profile (e.g. , stored in profiles database 116, Figures 1-2) for the user (e.g. , an email, SMS, IM, VoIP call, voicemail, and/or other communication method) .

In some embodiments, in accordance with a determination that the recipient preference is for the voice message, the server (1028) : provides at least the first portion of the voice message to the recipient； obtains, from the recipient, a recipient-converted text portion corresponding to the first portion of the voice message, where the recipient performs speech-to-text processing on at least the first portion of the voice message so as to convert at least the first portion of the voice message to the recipient-converted text portion； compares the recipient-converted text portion to the first text portion； and, in accordance with a determination that the recipient-converted text portion is different from the first text portion, adjusts at least one of a vocabulary for the speech-to-text processing or one or more parameters for speech-to-text processing performed on messages intended for the recipient. In some embodiments, after sending the voice message to the recipient, server system 108 receives text corresponding to the voice message from the recipient, where the recipient performed STT processing on the voice message to obtain the text corresponding to the voice message. In some embodiments, server system 108 or a component thereof (e.g. , STT tuning module 234, Figure 2) adjusts parameters of STT module 228 (Figure 2) based on the text corresponding to the voice message from the recipient. In some embodiments, the recipient performs STT processing based on a custom vocabulary or custom language models tailored to the services of the recipient whereas server system 108 performs STT processing with more generally applicable vocabulary and language models.

For example, when the text corresponding to the voice message received from the recipient differs from the result of STT processing performed at server system 108, STT tuning module 234 adjusts one or more parameters of STT module 228. In another example, when a comparison between the text corresponding to the voice message received from the recipient and the result of STT processing performed at server system 108 indicates a count of differences greater than a predefined count, STT tuning module 234 adjusts one or more parameters of STT module 228. For example, the vocabulary and/or language models for STT tuning module 234 are adjusted. In another example, STT tuning module 234 is only adjusted for messages directed to the particular recipient. As such, STT processing at server system 108 will be customized for subsequent voice messages intended for the recipient. In other words, for example, a customized STT profile for the recipient is developed by STT tuning module 234 for subsequent messages intended for the recipient.

It should be understood that the particular order in which the operations in Figures 10A-10C have been described is merely exemplary and is not intended to indicate that the described order is the only order in which the operations could be performed. One of ordinary skill in the art would recognize various ways to reorder the operations described herein. Additionally, it should be noted that details of other processes described herein with respect to other methods described herein (e.g. ,

methods

600, 700, 800, and 900) are also applicable in an analogous manner to method 1000 described above with respect Figures 10A-10C.

Figure 11 is a block diagram of a server-side module 106 in accordance with some embodiments. In some embodiments, server-side module 106 performs server-side data processing for a social networking platform implemented in server-client environment 100. For example, server-side module 106 is executed on server system 108 (e.g. , a server) , and server-side module 106 manages and operates the social networking platform. In some embodiments, server-side module 106 includes the following modules: first receiving module 1102； first sending module 1104； second receiving module 1106； and second sending module 1108.

In some embodiments, first receiving module 1102 is configured to receive an identifier of a user account, an identifier of a public account and social application information that are sent by a terminal. In some embodiments, the social application information includes text information, a voice data packet, or voice data for a plurality of voice data packets. In some embodiments, in response to a receiving a start command (i.e. , a first instruction) from the terminal to initiate a recording process, the server periodically receives voice data.

In some embodiments, first receiving module 1102 includes the following sub-units: first encapsulation sub-unit 1112； second encapsulation sub-unit 1114； conversion sub-unit 1116； and combination sub-unit 1118.

In some embodiments, first encapsulation sub-unit 1112 is configured to encapsulate received voice data into a voice data packet with a sequence number after each predefined time period.

In some embodiments, second encapsulation sub-unit 1114 is configured to: encapsulate received voice data for a current time period into a voice data packet with a sequence number and a terminator in response to receiving a stop command (i.e. , a second instruction) from the terminal.

In some embodiments, conversion sub-unit 1116 is configured to perform voice recognition (e.g. , speech-to-text (STT) processing) on the received voice data packet (s) to convert the received voice data packet (s) into text information

In some embodiments, combination sub-unit 1118 is configured to combine the converted text information corresponding to the received voice data packets into combined converted text information according to the sequence numbers of the received voice data packets.

In some embodiments, first sending module 1104 is configured to send the identifier of the user account and the social application information (e.g. , text information, converted text information, and/or the combined converted text information) to a background server corresponding to the identifier of the public account.

In some embodiments, second receiving module 1106 is configured to receive the identifier of the user account, the identifier of the public account, and the reply information from the background server.

In some embodiments, second sending module 1108 is configured to send, through a public account corresponding to the identifier of the public account, the reply information to the terminal corresponding to the identifier of the user account.

While particular embodiments are described above, it will be understood it is not intended to limit the application to these particular embodiments. On the contrary, the application includes alternatives, modifications and equivalents that are within the spirit and scope of the appended claims. Numerous specific details are set forth in order to provide a thorough understanding of the subject matter presented herein. But it will be apparent to one of ordinary skill in the art that the subject matter may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.

Claims

A method of disseminating messages in a social networking platform, comprising: at a server with one or more processors and memory:

receiving a first instruction from a user in the social networking platform to initiate a recording process for sending a voice message to a recipient in the social networking platform；

during the recording process:

obtaining at least a first portion of the voice message from the user；

performing speech-to-text processing to convert the first portion of the voice message to a first text portion；

providing the first text portion to the user for display at a client device corresponding to the user； and

after providing the first text portion to the user, receiving a second instruction from the user to terminate the recording process；

in response to receiving the second instruction from the user to terminate the recording process, determining a recipient preference for at least one of the voice message and a result of the speech-to-text processing； and

in accordance with a determination that the recipient preference is for the result of the speech-to-text processing, providing at least the first text portion corresponding to the voice message to the recipient.
The method of claim 1, further comprising:

after providing at least the first text portion corresponding to the voice message to the recipient, obtaining a response from the recipient； and

providing the response from the recipient to the user as a message via the social networking platform.
The method of any of claims 1-2, further comprising:

during the recording process:

after providing the first text portion to the user, obtaining a third instruction from the user to provide an alternate speech-to-text conversion for a selected sub-portion in the provided first text portion； and

providing the alternate speech-to-text conversion corresponding to the selected sub-portion for display at the client device corresponding to the user.
The method of any of claims 1-3, further comprising:

during the recording process:

after providing the first text portion to the user and prior to obtaining the second instruction from the user to terminate the recording process, obtaining at least a second portion of the voice message from the user distinct from the at least one first portion of the voice message；

performing speech-to-text processing to convert the second portion of the voice message to a second text portion； and

providing the second text portion to the user for concurrent display at the client device corresponding to the user with the first text portion.
The method of any of claims 1-4, wherein the recipient is associated with a public account in the social networking platform.
The method of any of claims 1-5, wherein the speech-to-text processing is performed in real-time on at least the first portion of the voice message.
The method of any of claims 1-6, further comprising:

in response to receiving the first instruction from the user in the social networking platform to initiate the recording process, establishing a data connection with the client device of the user so as to stream the voice message,

wherein the at least the first portion of the voice message is obtained from the user via the established data connection.
The method of any of claims 1-7, further comprising:

in accordance with a determination that the recipient preference is for the voice message, providing at least the first portion of the voice message to the recipient；

obtaining, from the recipient, a recipient-converted text portion corresponding to the first portion of the voice message, wherein the recipient performs speech-to-text processing on at least the first portion of the voice message so as to convert at least the first portion of the voice message to the recipient-converted text portion；

comparing the recipient-converted text portion to the first text portion； and

in accordance with a determination that the recipient-converted text portion is different from the first text portion, adjusting at least one of a vocabulary for the speech-to-text processing or one or more parameters for speech-to-text processing performed on messages intended for the recipient.
A server, comprising:

one or more processors； and

memory storing one or more programs to be executed by the one or more processors, the one or more programs comprising instructions for:

receiving a first instruction from a user in the social networking platform to initiate a recording process for sending a voice message to a recipient in the social networking platform；

during the recording process:

obtaining at least a first portion of the voice message from the user；

performing speech-to-text processing to convert the first portion of the voice message to a first text portion；

providing the first text portion to the user for display at a client device corresponding to the user； and

after providing the first text portion to the user, receiving a second instruction from the user to terminate the recording process；

in response to receiving the second instruction from the user to terminate the recording process, determining a recipient preference for at least one of the voice message and a result of the speech-to-text processing； and

in accordance with a determination that the recipient preference is for the result of the speech-to-text processing, providing at least the first text portion corresponding to the voice message to the recipient.
The server of claim 9, wherein the one or more programs further comprise instructions for:

after providing at least the first text portion corresponding to the voice message to the recipient, obtaining a response from the recipient； and

providing the response from the recipient to the user as a message via the social networking platform.
The server of any of claims 9-10, wherein the one or more programs further comprise instructions for:

during the recording process:

after providing the first text portion to the user, obtaining a third instruction from the user to provide an alternate speech-to-text conversion for a selected sub-portion in the provided first text portion； and

providing the alternate speech-to-text conversion corresponding to the selected sub-portion for display at the client device corresponding to the user.
The server of any of claims 9-11, wherein the one or more programs further comprise instructions for:

during the recording process:

after providing the first text portion to the user and prior to obtaining the second instruction from the user to terminate the recording process, obtaining at least a second portion of the voice message from the user distinct from the at least one first portion of the voice message；

performing speech-to-text processing to convert the second portion of the voice message to a second text portion； and

providing the second text portion to the user for concurrent display at the client device corresponding to the user with the first text portion.
The server of any of claims 9-12, wherein the recipient is associated with a public account in the social networking platform.
The server of any of claims 9-13, wherein the speech-to-text processing is performed in real-time on at least the first portion of the voice message.
The server of any of claims 9-14, wherein the one or more programs further comprise instructions for:

in accordance with a determination that the recipient preference is for the voice message, providing at least the first portion of the voice message to the recipient；

obtaining, from the recipient, a recipient-converted text portion corresponding to the first portion of the voice message, wherein the recipient performs speech-to-text processing on at least the first portion of the voice message so as to convert at least the first portion of the voice message to the recipient-converted text portion；

comparing the recipient-converted text portion to the first text portion； and

in accordance with a determination that the recipient-converted text portion is different from the first text portion, adjusting at least one of a vocabulary for the speech-to-text processing or one or more parameters for speech-to-text processing performed on messages intended for the recipient.
A non-transitory computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which, when executed by a server with one or more processors, cause the server to perform operations comprising:

receiving a first instruction from a user in the social networking platform to initiate a recording process for sending a voice message to a recipient in the social networking platform；

during the recording process:

obtaining at least a first portion of the voice message from the user；

performing speech-to-text processing to convert the first portion of the voice message to a first text portion；

providing the first text portion to the user for display at a client device corresponding to the user； and

after providing the first text portion to the user, receiving a second instruction from the user to terminate the recording process；

in response to receiving the second instruction from the user to terminate the recording process, determining a recipient preference for at least one of the voice message and a result of the speech-to-text processing； and

in accordance with a determination that the recipient preference is for the result of the speech-to-text processing, providing at least the first text portion corresponding to the voice message to the recipient.
The non-transitory computer readable storage medium of claim 16, wherein the instructions cause the server to perform operations further comprising:

after providing at least the first text portion corresponding to the voice message to the recipient, obtaining a response from the recipient； and

providing the response from the recipient to the user as a message via the social networking platform.
The non-transitory computer readable storage medium of any of claims 16-17, wherein the instructions cause the server to perform operations further comprising:

during the recording process:

after providing the first text portion to the user, obtaining a third instruction from the user to provide an alternate speech-to-text conversion for a selected sub-portion in the provided first text portion； and

providing the alternate speech-to-text conversion corresponding to the selected sub-portion for display at the client device corresponding to the user.
The non-transitory computer readable storage medium of any of claims 16-18, wherein the instructions cause the server to perform operations further comprising:

during the recording process:

after providing the first text portion to the user and prior to obtaining the second instruction from the user to terminate the recording process, obtaining at least a second portion of the voice message from the user distinct from the at least one first portion of the voice message；

performing speech-to-text processing to convert the second portion of the voice message to a second text portion； and

providing the second text portion to the user for concurrent display at the client device corresponding to the user with the first text portion.
The non-transitory computer readable storage medium of any of claims 16-19, wherein the instructions cause the server to perform operations further comprising:

in accordance with a determination that the recipient preference is for the voice message, providing at least the first portion of the voice message to the recipient；

obtaining, from the recipient, a recipient-converted text portion corresponding to the first portion of the voice message, wherein the recipient performs speech-to-text processing on at least the first portion of the voice message so as to convert at least the first portion of the voice message to the recipient-converted text portion；

comparing the recipient-converted text portion to the first text portion； and

in accordance with a determination that the recipient-converted text portion is different from the first text portion, adjusting at least one of a vocabulary for the speech-to-text processing or one or more parameters for speech-to-text processing performed on messages intended for the recipient.