CN113936638B

CN113936638B - Method, device and terminal device for audio playback of text

Info

Publication number: CN113936638B
Application number: CN202010603452.4A
Authority: CN
Inventors: 罗义; 王守诚; 谢鲁冰
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2025-05-09
Anticipated expiration: 2040-06-29
Also published as: CN113936638A

Abstract

The present application provides a method, device and terminal device for audible playback of text, and relates to the field of terminal technology. The method includes: identifying non-text information in a target text; determining the audio information corresponding to the non-text information; and audibly playing the target text according to the audio information corresponding to the non-text information. The method for audible playback of text provided in this embodiment can fully express the non-text information in the target text during the process of audibly playing the target text, improve the expression effect of the terminal device on the text information, and improve the user experience.

Description

Method and device for playing text in sound and terminal equipment

Technical Field

The present application relates to the field of terminal technologies, and in particular, to a method and an apparatus for playing text in voice, and a terminal device.

Background

The terminal devices typically have text-reading capabilities. At present, the mode of reading the text by the terminal equipment is to play pronunciation audio corresponding to characters in the text, so that the aim of reading the text is fulfilled. But the text read by the terminal device may include not only characters but also some non-text information, such as punctuation marks, emoticons, underline fonts, etc., for expressing sentence patterns, mood, emotion, etc., or highlighting part of the characters. At present, when the terminal equipment reads the text, only the text in the text is read, so that the text containing non-text information cannot fully express the non-text information in the text, and the expression effect on the text information is poor.

Disclosure of Invention

The application provides a method, a device and a terminal device for playing a text in a sound way, which solve the problem that the expression effect on text information is poor when the terminal device reads the text in the prior art.

In order to achieve the above purpose, the application adopts the following technical scheme:

In a first aspect, the application provides a method for playing a text in a sound way, which comprises the steps of identifying non-text information in a target text, determining audio information corresponding to the non-text information, and playing the target text in a sound way according to the audio information corresponding to the non-text information.

According to the method for playing the text in the sound, the audio information corresponding to the non-text information in the target text can be played in the process of playing the target text in the sound, so that the non-text information in the text is fully expressed, the expression effect on the text information is improved, and the user experience is improved.

With reference to the first aspect, in some embodiments, the non-text information includes an emoji symbol, a typeset control symbol, a punctuation symbol, a mathematical symbol, an annotation symbol, or a characteristic font style of text.

With reference to the first aspect, in some embodiments, when the target text includes text and the non-text information, determining audio information corresponding to the non-text information includes:

If the non-text information is a first symbol, determining an application type of the first symbol according to the semantics of the target text, wherein the first symbol comprises audio information corresponding to at least two application types;

and determining the audio information of the first symbol according to the application type of the first symbol.

For the first symbol with different reading methods in different language scenes, the audio information corresponding to the first symbol and the scene can be determined by identifying the semantics of the target text, so that the terminal equipment is prevented from playing the audio information unsuitable for the scene, and the expression effect of the text information is improved.

With reference to the first aspect, in some embodiments, when the target text includes text and the non-text information, playing the target text in a sound according to audio information corresponding to the non-text information includes:

and if the non-text information is the expression symbol, the typesetting control symbol, the punctuation mark, the mathematical symbol or the annotation symbol, sequentially playing the audio information corresponding to the text and the audio information corresponding to the non-text information according to the arrangement sequence of the text and the non-text information in the target text.

If the non-text information is the annotation symbol, identifying the annotation text corresponding to the annotation symbol;

And sequentially playing the audio information corresponding to the words and the audio information corresponding to the annotation words according to the arrangement sequence of the words and the annotation symbols in the target text.

and after the audio information of all the words of the sentence where the non-word information is located is played, playing the audio information corresponding to the annotation word.

By playing the audio information corresponding to the annotation text in the process of playing the audio information of the text, the user can know the text information in more detail.

With reference to the first aspect, in some embodiments, when the target text includes text and non-text information, according to audio information corresponding to the non-text information, playing the target text in a sound includes:

If the non-text information is the characteristic font style, playing the audio information corresponding to the characteristic font style while playing the audio information corresponding to the text with the characteristic font style, and taking the audio information corresponding to the characteristic font style as background sound of the audio information corresponding to the text with the characteristic font style.

With reference to the first aspect, in some embodiments, the determining the audio information corresponding to the non-text information includes:

and determining the audio information corresponding to the non-text information from a preset audio information base according to the identification information of the non-text information.

In a second aspect, the present application provides a device for playing text in sound, the device comprising:

The identification unit is used for identifying non-text information in the target text;

the determining unit is used for determining the audio information corresponding to the non-text information;

And the playing control unit is used for playing the target text in a sounding way according to the audio information corresponding to the non-text information.

In a third aspect, the present embodiment provides a terminal device, including a speaker, a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the method for playing text in sound according to the first aspect when the processor executes the computer program.

In a fourth aspect, the present embodiment provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the method for audio playback of text as described in the first aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product, which when run on a terminal device, causes the terminal device to perform the method for audio playback of text according to the first aspect.

In a sixth aspect, an embodiment of the present application provides a chip system, where the chip system includes a processor, where the processor is coupled to a memory, and the processor executes a computer program stored in the memory, to implement the method for playing text in sound according to the first aspect. In this embodiment, the chip system may be a single chip, or a chip module formed by a plurality of chips.

It will be appreciated that the advantages of the second to sixth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

Fig. 1 is a schematic structural diagram of a mobile phone to which a method for playing text with sound according to an embodiment of the present application is applicable;

fig. 2 is a schematic structural diagram of a processor to which the method for playing text with sound according to the embodiment of the present application is applicable;

fig. 3 is a schematic diagram of a software architecture to which the method for playing text with sound provided by the embodiment of the present application is applicable;

Fig. 4 is a schematic flow chart of a method for playing text with sound according to an embodiment of the present application;

fig. 5a is a schematic diagram of user control for audio playing of text according to an embodiment of the present application;

Fig. 5b is a second user control schematic diagram of audio playing of text according to an embodiment of the present application;

fig. 6a is a schematic diagram of an emoji emoticon according to an embodiment of the application;

fig. 6b is a schematic diagram of an emoji emoticon according to an embodiment of the application;

Fig. 6c is a schematic diagram III of an emoji emoticon according to an embodiment of the application;

fig. 6d is a schematic diagram of an emoji emoticon according to an embodiment of the application;

FIG. 7 is a schematic diagram showing a program code according to an embodiment of the present application;

FIG. 8 is a second schematic diagram of a program code according to an embodiment of the present application;

fig. 9 is a second flow chart of a method for playing text with sound according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a text audio playing device according to an embodiment of the present application.

Detailed Description

The method for playing the text in the sound way can be applied to terminal equipment with audio information playing functions such as mobile phones, tablet computers, electronic readers, notebook computers, netbooks, wearable equipment and the like, and the embodiment of the application does not limit the specific type of the terminal equipment.

Take the example that the terminal device is a mobile phone. Fig. 1 is a block diagram showing a part of a structure of a mobile phone according to an embodiment of the present application. Referring to fig. 1, the mobile phone includes Radio Frequency (RF) circuit 110, memory 120, input unit 130, display unit 140, sensor 150, audio circuit 160, wireless fidelity (WIRELESS FIDELITY, wiFi) module 170, processor 180, and power supply 190. Those skilled in the art will appreciate that the handset configuration shown in fig. 1 is not limiting of the handset and may include more or fewer components than shown, or may combine certain components, or may be arranged in a different arrangement of components.

The following describes the components of the mobile phone in detail with reference to fig. 1:

The RF circuit 110 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, specifically, receiving downlink information of a base station, processing the downlink information by the processor 180, and transmitting uplink data to the base station. Typically, RF circuitry includes, but is not limited to, antennas, at least one amplifier, transceivers, couplers, low noise amplifiers (Low Noise Amplifier, LNAs), diplexers, and the like. In addition, RF circuit 110 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (Global System of Mobile communication, GSM), general Packet Radio Service (GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), long term evolution (Long Term Evolution, LTE)), email, short message Service (Short MESSAGING SERVICE, SMS), and the like.

The memory 120 may be used to store software programs and modules, and the processor 180 performs various functional applications and data processing of the cellular phone by running the software programs and modules stored in the memory 120. The memory 120 may mainly include a storage program area that may store an operating system, an application program required for at least one function (such as an audio information playing function, a text display function, etc.), etc., and a storage data area that may store data created according to the use of the cellular phone (such as audio information, a phonebook, etc.), etc. In addition, memory 120 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 130 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the handset. In particular, the input unit 130 may include a touch panel 131 and other input devices 132. The touch panel 131, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 131 or thereabout by using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 131 may include two parts of a touch detection device and a touch controller. The touch controller receives touch information from the touch detection device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 180, and can receive and execute commands sent by the processor 180. In addition, the touch panel 131 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 130 may include other input devices 132 in addition to the touch panel 131. In particular, other input devices 132 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 140 may be used to display information input by a user or information provided to the user and various menus of the mobile phone. The display unit 140 may include a display panel 141, and alternatively, the display panel 141 may be configured in the form of a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 131 may cover the display panel 141, and when the touch panel 131 detects a touch operation thereon or thereabout, the touch panel is transferred to the processor 180 to determine the type of the touch event, and then the processor 180 provides a corresponding visual output on the display panel 141 according to the type of the touch event. Although in fig. 1, the touch panel 131 and the display panel 141 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 131 and the display panel 141 may be integrated to implement the input and output functions of the mobile phone.

The handset may also include at least one sensor 150, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 141 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 141 and/or the backlight when the mobile phone moves to the ear. As one type of motion sensor, the accelerometer sensor can detect the acceleration in all directions (typically three axes), and can detect the gravity and direction when stationary, and can be used for applications for recognizing the gesture of a mobile phone (such as horizontal-vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer, knocking), and the like. Other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the handset are not described in detail herein.

The audio circuit 160, speaker 161, microphone 162 may provide an audio interface between the user and the handset. The audio circuit 160 may transmit the received electrical signal converted from audio data to the speaker 161 for conversion into sound signals for output by the speaker 161, while the microphone 162 may convert the collected sound signals into electrical signals for reception by the audio circuit 160 for conversion into audio data, for processing by the audio data output processor 180, for transmission to another cell phone, for example, via the RF circuit 110, or for outputting the audio data to the memory 120 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a mobile phone can help a user to send and receive emails, browse webpages, access streaming media and the like through the WiFi module 170, so that wireless broadband Internet access is provided for the user. Although fig. 1 shows a WiFi module 170, it is understood that it does not belong to the necessary constitution of the handset, and can be omitted entirely as required within the scope of not changing the essence of the invention.

The processor 180 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions and processes data of the mobile phone by running or executing software programs and/or modules stored in the memory 120 and calling data stored in the memory 120, thereby performing overall monitoring of the mobile phone. Alternatively, the processor 180 may include one or more processing units, and preferably the processor 180 may integrate an application processor that primarily processes operating systems, user interfaces, applications, etc., with a modem processor that primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 180.

Referring to fig. 2, the processor 180 may include a Text control unit (Text View) 1801, a Text rendering unit (Draw Text) 1802, a Text rendering engine 1803, a speakable trigger 1804, and a Text To Speech (TTS) engine 1805. The text control unit 1801 is configured to determine display information such as content, font size, and display shape of the text. A text drawing unit 1802 for controlling a typesetting style of text. The text rendering engine 1803 is configured to finally determine a display image of the text in the display interface according to the display information and the typesetting style of the text. And the read-aloud trigger 1804 is used for selecting the text according to the touch operation of the user, determining the text as the target text, and controlling the mobile phone to start to play the target text in a sounding manner. A text-to-speech engine 1805 for playing audio information corresponding to the target text in cooperation with the speaker 161.

The handset further includes a power supply 190 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 180 via a power management system so as to provide for managing charging, discharging, and power consumption by the power management system.

Although not shown, the handset may also include a camera. Optionally, the position of the camera on the mobile phone may be front or rear, which is not limited by the embodiment of the present application. Alternatively, the mobile phone may include a single camera, a dual camera, or a triple camera, which is not limited in the embodiment of the present application. For example, a cell phone may include three cameras, one of which is a main camera, one of which is a wide angle camera, and one of which is a tele camera. When the mobile phone includes a plurality of cameras, the plurality of cameras may be all front-mounted, all rear-mounted, or one part of front-mounted, another part of rear-mounted, which is not limited by the embodiment of the present application.

In addition, although not shown, the mobile phone may further include a bluetooth module, etc., which will not be described herein.

Fig. 3 is a schematic software structure of a mobile phone according to an embodiment of the application. Taking a mobile phone operating system as an Android system as an example, in some embodiments, the Android system is divided into four layers, namely an application layer, an application framework layer (FWK), a system layer and a hardware abstraction layer, and the layers are communicated through software interfaces.

As shown in fig. 3, the application layer may be a series of application packages, where the application packages may include applications such as short messages, calendars, cameras, video, navigation, gallery, phone calls, etc.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer may include some predefined functions, such as functions for receiving events sent by the application framework layer.

As shown in fig. 3, the application framework layer may include a window manager, a resource manager, a notification manager, and the like.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

The application framework layer may further include:

A view system including visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephone manager is used for providing communication functions of the mobile phone. Such as the management of call status (including on, hung-up, etc.).

The system layer may include a plurality of functional modules. Such as a sensor service module, a physical state recognition module, a three-dimensional graphics processing library (e.g., openGL ES), etc.

The sensor service module is used for monitoring sensor data uploaded by various sensors of the hardware layer and determining the physical state of the mobile phone;

the physical state recognition module is used for analyzing and recognizing gestures, faces and the like of the user;

the three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The system layer may further include:

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The hardware abstraction layer is a layer between hardware and software. The hardware abstraction layer may include display drivers, camera drivers, sensor drivers, etc. for driving the relevant hardware of the hardware layer, such as a display screen, camera, sensor, etc.

The following embodiments may be implemented on a terminal device having the above-described hardware/software structure. The following embodiment will take a mobile phone as an example to describe a method for playing text with sound according to the embodiment of the present application.

Cell phones typically have text reading capabilities. At present, the mode of reading the text by the mobile phone only plays the pronunciation audio corresponding to the characters in the text, thereby achieving the purpose of reading the text. But these texts may include not only text but also some non-text information such as punctuation, emoticons, underlines, etc. for expressing a sentence reading, mood, emotion, etc. of the text or highlighting part of the text. Therefore, for the text containing the non-text information, when the mobile phone adopts the text reading method, the non-text information in the text cannot be fully expressed, and the expression effect on the text information is affected.

For example, for the text "please know you. About. About.A current way for a mobile phone to read the text is to play only the audio information in which all the words" please know you "are read. However, the non-text information "≡a" in the text is not expressed, and the expression effect of the text information is affected.

Therefore, the embodiment of the application provides a sound playing method of a text, which can express non-text information in the text and improve the expression effect of the text information.

Referring to fig. 4, a flowchart of a method for playing text with sound is provided in an embodiment of the present application. The method comprises the following steps S401-S403.

S401, the mobile phone identifies non-text information in the target text.

In this embodiment, the target text refers to text selected by the user to be played by the mobile phone in a voice, which may include all content in one text (for example, a novel) or may be a part of the text (for example, some paragraphs in the novel), which is not limited in this embodiment.

In some embodiments, referring to fig. 5a, when the mobile phone displays text, the user may select a part of the text as a target text by touching, click a play icon, and input a play instruction to the mobile phone to control the mobile phone to play the target text in a sounding manner. In fig. 5a, the text selected by the user is shaded. In other embodiments, referring to fig. 5b, the user may also directly click on the play icon to control the mobile phone to take the text as the target text and play the text audibly.

The target text stored in the mobile phone comprises identification information of each text and non-text information. The identification information is used for uniquely indicating a text or non-text information. Through the identification information of the target text, the mobile phone can identify the text and the non-text information in the text.

The identification information may be, for example, a character code such as Unicode. For example, the "king" 16-bit Unicode is U+738B, "," 16-bit Unicode is U+002C, the tabulated 16-bit Unicode of the table symbol is U+0009, the underlined 16-bit Unicode is U+2381, and the like.

The text related to this embodiment may be in various language forms such as chinese, english, japanese, and french, and may be, for example, "happy", "o", "Bonjour", and the like. The non-textual information may be emoticons, punctuation marks, mathematical symbols, typesetting control symbols, annotation symbols, or character font styles of words, etc.

Among them, an expression symbol is generally used to express a certain expression, such as a happy expression, a lacrimation expression, a boring expression, a cheering expression, and the like. However, in the present embodiment, the emoticons are not limited thereto, but may be used to represent something such as moon, christmas tree, house, flower, etc.

In some embodiments, the emoticons may be combined from some non-literal symbols, or from a combination of non-literal symbols and literal. By way of example, these non-literal symbols may be "," _ "," "", "_", "", "-", ">", "<", "|", "\" and "/", etc. The combined expression symbols can be "<" > "," > "<" > ", Y (") Y, etc., each emoticon is used to represent an expression, and can be seen in table 1.

Table 1 correspondence table 1

In other embodiments, the emoticon may be an emoji emoticon. Illustratively, the emoji emoticon shown in fig. 6a is used to represent an open smiling emotion. Fig. 6b shows emoji emoticons for expressing the expression of lacrimation. The emoji emoticon shown in fig. 6c is used to represent an unobtrusive emotion. The emoji emoticon shown in fig. 6d is used to represent the expression of panic.

In this embodiment, punctuation marks include ",". ","; ", and", "|", "(", ")", "{", "" - "," - ",".

In this embodiment, mathematical symbols include symbols for data calculation or representing units such as "+", "-", "" ","/"," log "," m (meter) "," mm (millimeter) ",".

The typesetting control symbols can be tab symbols, line-feed symbols, section symbols and the like, and are used for controlling characters, punctuation marks, expression symbols and the like typeset on a display interface, so that the layers of the text are clearly expressed and the text is convenient for a user to read. These typesetting control symbols may be displayed in the display interface or may be hidden in the display interface according to the user's setting. For example, when a tab symbol is displayed in the display interface, it may be indicated by "→". The line feed may be represented by "CRLF" when displayed in the display interface. When a space is displayed in the display interface, it may be indicated by a lighter color than the letter "·".

The annotation symbol may be a superscript or subscript of a word or word corresponding to the annotation word associated with the word or word. For example, in the text "what is unknown on the day imperial palace ①, what is the year of the day", the annotation symbol is "①", and the corresponding annotation letter is "imperial palace" for palace, which is imperial palace "because there is a double-threshold outside the palace.

The annotation symbol may be a symbol in the program code for representing an explanation. For example, a first note symbol "/", a second note symbol "/", and a third note symbol "//". In the program code, "/" is used in conjunction with "/" and the words between "/" and "/" are words of comments which may occupy multiple lines. The words following "//" are usually used alone, i.e. the annotation words, which usually occupy only one line.

The characteristic font style may be italic, underlined, strikethrough, bolded, under-colored, or the like. For example, in the text "you should ensure that the information is correct", the feature font style is an underlined font.

S402, the mobile phone determines audio information corresponding to the non-text information.

The mobile phone is preset with an audio information base which comprises audio information corresponding to text and non-text information. For example, the correspondence between text or non-text information and audio information may be as shown in table 2.

Table 2 correspondence table two

For a word, its audio information generally corresponds to the pronunciation of the word. For example, the pronunciation corresponding to the audio information of the word "you" is [ you ]. The pronunciation corresponding to the audio information of the word "good" is [ good ].

For non-text information, the emotion expressed by the audio information corresponding to the expression symbol should be the same as the emotion expressed by the expression symbol. For example, for expression of smiling faceThe pronunciation corresponding to the audio information can be [ haha ]. The tone expressed by the audio information of the punctuation mark should be the same as the tone expressed by the punctuation mark. For example, for punctuation "? which may correspond to audio information pronounced as a" a-ta ". The audio information corresponding to the characteristic font style should be convenient for the user to clearly understand the information conveyed by the characteristic font style. For example, the audio information corresponding to the underlined font may be the sound of a pencil scribe. The specific content of the audio information corresponding to the non-text information is not limited in this embodiment.

It should be noted that, in the audio information base, each text or non-text information corresponds to at least one audio information. For example, the audio information of the word "you" may be a female sound uttered as "you" or a male sound uttered as "you". Or the audio information corresponding to the underline can be the sound of the pencil line or the sound of the pen line. Still alternatively, the pronunciation corresponding to the audio information of "/" may be "or" (for example, in the text "s/he") or "divided" (for example, in the text "if a=100, b=20, please calculate the value of a/b").

In addition, for ",". The occurrence frequency of the punctuation marks used for representing sentence reading is too high, and corresponding audio information can not be set in the audio information base, so that the situation that the audio information is inserted too frequently in the process of audio playing of the text, and the user experience is reduced is avoided.

S403, the mobile phone plays the target text in a sound mode according to the audio information corresponding to the non-text information.

In the target text, both the text and the non-text information have certain position information, and the position information is used for representing the arrangement order of the text and the non-text information in the target text. For example, for a character (including text, punctuation, mathematical symbols, emoticons, notes, etc.), when its position information is 5, it represents that the character is the 5 th character in the target text. For a characteristic font style (e.g., bolded font, italic or underlined, etc.), when its position information is 10 to 15, it means that the characteristic font style is at the position of the 10 th to 15 th characters.

For convenience of description, the positional information a to b is hereinafter collectively denoted as [ a, b ], where a is less than or equal to b, and a and b are integers. For example, the present embodiment represents the position information 5 as [5,5], and the position information 10 to 15 as [10,15].

In this embodiment, the character displayed or uttered as a whole corresponds to one display position, and each display position corresponds to one position information. For example, a Chinese character (e.g., "you", "good"), an English word (e.g., "happy", "a", etc.), a punctuation mark (e.g., "'," etc.), an emoji symbol (e.g., "a_" or emoji "), an annotation symbol (e.g.," ^①"、"^② "), a typesetting control symbol (e.g., space, tab, or line feed), etc., respectively correspond to a display position, and respectively have a corresponding position information. Of course, the method for determining the location information may also be in other forms, and the embodiment is not limited.

Taking the target text of ' king teacher good ' and ' happy knowledge of your, wherein the position information of the words ' king teacher good ' in the target text is [1,5], punctuation marks ', ' position information in the target text is [6,6], expression characters ' expression ' in the target text is [7,7 ', ' happy knowledge of your ' in the target text ' is [8,13] in sequence.

Taking the example of the target text "you should ensure that the information is accurate", the position information of the underlined font in the target text is [8,11].

Taking the target text "unknown what year is the day imperial palace ^①" as an example, and the annotation character corresponding to the annotation symbol ^① is imperial palace, which is a palace, and is imperial palace "because there is a double threshold outside the palace. Wherein the location information of the annotation symbol ^① in the target text is [7,7].

In some embodiments, only non-textual information is included in the target text. Then, in the process of playing the target text in a sound way, the mobile phone directly plays the audio information corresponding to the non-text information in the target text. For example, in a chat scenario, only one emoji emoticon, such as that shown in fig. 6a, is included in the target text. Then, in the process of playing the target text, the mobile phone directly plays the audio information corresponding to the emoji expression symbol.

In other embodiments, the target text includes both textual and non-textual information. The mobile phone can have different playing modes in the process of playing the target text. The following will take a specific text as an example to describe a playing mode of the target text provided by the present application.

Taking the example that the target text is "you good, and happy, you are known," the mobile phone can play the audio information corresponding to the characters and the audio information corresponding to the non-character information in sequence according to the arrangement sequence of the characters and the non-character information in the target text in the process of playing the target text in a sounding manner. Specifically, the following is shown.

First, it is recognized that the position information of "jockey your good" in the target text is [1,5], punctuation marks, "[ 6,6], the position information of the expression symbol" ≡p "in the target text is [7,7], and the position information of" happy understanding your "in the target text is [8,13] in sequence. Then, the mobile phone acquires the audio information corresponding to the characters "king", "old", "teacher", "your", "good", "very good", "happy", "confirmatory", "acquaintance" and "+_", from a preset audio information base. Finally, the mobile phone plays the audio information of "king", the audio information of "old", the audio information of "teacher", the audio information of "your", the audio information of "good" and the audio information of "good" in the fourth play, pauses for a preset time (for example, 0.5 s) at the sixth play position (i.e., the "corresponding position") to indicate the pause of sentences, plays the audio information of "≡a" -, and plays the audio information of "very" in the seventh play. And by analogy, playing the audio information of 'happy', 'recognized', 'you' so as to finish the sound playing of the target text 'the king teacher is happy and recognizes you'.

It should be noted that, for example, when the target text includes the emoticon in the chat scene, the audio information corresponding to the emoticon is added to the text sound played by the mobile phone, so that the text information is expressed more vividly.

Taking the example that the target text is 'you should ensure that the information is accurate', the mobile phone can play the audio information corresponding to the lower score line as the background sound of the audio information of 'accurate' in the process of playing the audio information of 'accurate' of the words in the process of playing the audio information of the target text in a sounding manner. Specifically, the following is shown.

Firstly, the mobile phone recognizes that the position information of 'you should ensure that the information is accurate' in the target text is [1,11] in sequence, and the underlined position information is [8,11]. Then, the mobile phone respectively acquires audio information of "you", "should", "sure", "guarantee", "information", "all", "accurate", "sure", "no", "error" and underlined audio information from a preset audio information base. Finally, according to the above position information, playing audio information of "you" first, playing audio information of "should", and so on, playing audio information of "ensure", "information", "all", "accurate", "exact", "no", "error". And when the mobile phone starts playing the quasi audio information, the underlined audio information (such as the sound of the pencil line) is played until the wrong audio information is played. That is, in the process of playing audio information of the words "accurate", "exact", "no", "false", audio information corresponding to the underline is played.

Taking the example of the target text "what year is unknown from the world imperial palace ^①, the mobile phone can insert the annotation text" imperial palace corresponding to the annotation symbol ^① to refer to palace in the process of playing the audio information of the text "what year is unknown from the world imperial palace" in the process of playing the target text in sound, and the audio information of the name imperial palace "is called as palace because double-threshold is arranged outside the palace. Specifically, the following is shown.

First, the mobile phone recognizes that the "unknown zenith imperial palace" position information in the target text is [1,6], the "^①" position information in the text is [7,7], "position information is" [8,8], "what year the day is, and" 9,13]. Then, the mobile phone obtains the audio information of each word in "what is known about the world imperial palace, what is the year at the present time" from the preset audio information base, and the annotation word "imperial palace" refers to the palace, and the corresponding audio information of each word in the term imperial palace "is called because there is a double-threshold outside the palace. Finally, the mobile phone combines the audio information of the annotation text corresponding to the annotation symbol ^① to play the target text.

In one possible implementation manner, the mobile phone may sequentially play the audio information corresponding to the text and the audio information corresponding to the annotation text according to the arrangement sequence of the text and the annotation symbol in the target text. For example, the audio information of "unknown on the sky imperial palace" is sequentially played, then the annotation text "imperial palace" refers to a palace, the audio information of "imperial palace" is called because there is a double-threshold outside the palace, and finally the audio information corresponding to "what year is the next day" is played.

In another possible implementation manner, the mobile phone may play the audio information corresponding to the annotation text after playing the audio information of all the text of the sentence in which the non-text information is located. For example, the mobile phone may play the audio information corresponding to each word in "what is known in the sky imperial palace, what is the year at the present time," imperial palace playing the annotation word "refers to palace, and since there is a double-threshold outside the palace, the audio information corresponding to each word in" imperial palace "is called.

In this embodiment, the user can learn the text information in more detail by playing the audio information corresponding to the annotation text in the process of playing the audio information of the text.

Taking the example of the target text being program code such as that shown in fig. 7, the program code includes a plurality of typesetting control symbols therein. The mobile phone can insert the audio information corresponding to the typesetting control symbol in the process of playing the audio information of the program code words in the process of playing the target text in a sounding manner.

It should be noted that, in general, when the mobile phone displays the program code, the typesetting control symbol is not displayed, so as to avoid affecting the reading experience of the user. However, for convenience of description of the present embodiment, for example, as shown in fig. 7, the present embodiment shows program codes on which typesetting control symbols are displayed.

In the process of playing the seventh line of the program code in a sounding way, the mobile phone firstly recognizes that the position information of the tab symbol, "dependences", the space symbol, "{", and the "CRLF" included in the program code are 1,2, 3, 4 and 5 respectively. Then, the mobile phone respectively acquires the audio information of the tab symbol, "dependences", the space symbol, "{ and the carriage return symbol from a preset audio information base, and sequentially plays the audio information of the tab symbol," dependences ", the space symbol," { and the carriage return symbol according to the position information, so that typesetting information of the text is expressed in the process of audibly playing the text.

In one example, the audio information of the typesetting control symbols such as the tab symbol, the space symbol, the carriage return symbol and the like can respectively correspond to different sounds of clicking the keyboard, and the specific content of the audio information is not limited in this embodiment.

In the text, the space is generally used to separate two words and symbols. Therefore, although the space occupies one display position, the mobile phone may not play the corresponding audio information during the process of playing the target text in a sound manner.

Taking the example where the target text is the program code shown in fig. 8, the program code has a plurality of pieces of comment information therein. The text of lines 4 to 6 is annotation information, which includes annotation symbols "/" and "/", and annotation words "here need to specify jcenter specific url". Behavior 11 an annotation information comprising annotation symbols "///" and annotation words "versions must be 3.2.1 or more"

In one possible implementation, the terminal device may not play the annotation symbol and the audio information corresponding to the annotation text when encountering the annotation information during the audio playing of the program code. For example, in the process of playing the program code shown in fig. 8, the audio information of the text of lines 4 to 6 and 11 may not be played.

In another possible implementation manner, during the process of playing the program code in a sound manner, when the terminal device encounters the annotation information, the audio information corresponding to the annotation symbol and the annotation text can be played in sequence according to the position information of the annotation symbol and the annotation text. The audio information corresponding to the annotation symbol may be "ding-dong" or "code annotation", etc., and it should be noted that "ding-dong" is used for representing the sound effect of the audio information of the annotation symbol.

For example, the terminal device may play the 4 th to 6 th run-length codes of FIG. 8 as [ dingdong ] or [ code notes ] here need to specify jcenter's specific url ] or jcenter's specific url ] when playing the 4 th to 6 th run-length codes in a voiced manner. In the process of playing the line 11 text of fig. 8 in a sounding manner, the terminal device can play the text as a 'ding-dong' version which is required to be more than 3.2.1 or a 'code annotation version which is required to be more than 3.2.1'.

In addition, some first symbols may be included in the non-text information of the target text, which have different pronunciations in different language scenes. The first symbol may be, for example, "/", in the text "s/he," the "/" reads as "or", and in the text "if a=100, b=20, please calculate the value of a/b," the "/" reads as "divided by".

In order to enable the mobile phone to accurately play the first symbol in the target text in a sounding manner, referring to fig. 9, the present embodiment further provides a method for playing text in a sounding manner, which includes the following steps S901-S904.

S901, the mobile phone identifies a first symbol in a target text.

In this embodiment, a first symbol list is maintained in the mobile phone, where the first symbol list includes identification information of a plurality of first symbols. For each character in the text, the handset compares its identification information with the first list of symbols. If the identification information of the character can be found in the first symbol list, the character is determined to be the first character.

S902, the mobile phone determines the application type of the first symbol according to the semantics of the target text.

In the present embodiment, the application type is used to represent that one symbol is used as a punctuation mark, or as a mathematical symbol, or as an annotation symbol, or the like. And, the first symbol includes audio information corresponding to at least two application types.

In some embodiments, the mobile phone may identify the semantics of the target text according to the keywords in the target text, and determine the application type of the first symbol. For example, the type of the first symbol may be determined to be a mathematical symbol by identifying words such as "calculate," "numerical," "compare," "absolute," etc., to determine that the target text describes information associated with a mathematical operation. Taking the target text "if a=100, b=20, please calculate the value of a/b" as an example, the mobile phone can determine that the application type of "/" is a mathematical symbol according to words such as the words "=", "calculate", "value", and the like.

S903, the mobile phone determines the audio information corresponding to the first symbol according to the application type of the first symbol.

In a preset audio information base, the first symbol comprises audio information corresponding to at least two application types. Taking the first symbol as "/", in conjunction with Table 1, the audio information is audio information 8-1: [ or ], and audio information 8-2: [ divided ].

For the target text, "if a=100, b=20, please calculate the value of a/b", because the application type of "/" is a mathematical symbol, the corresponding audio information is audio information 8-2: [ divided ].

S904, the mobile phone plays the target text in a sound mode according to the audio information corresponding to the first symbol.

Taking the target text "if a=100, b=20, please calculate the value of a/b" as an example, the mobile phone can identify the position information of each character during the process of playing the target text in a sounding manner, wherein the position information of the punctuation mark "/" is [17,17]. In the process of sequentially playing the audio information of each character, the mobile phone plays the audio information 8-2 of "/" [ divided by ] at the playing time corresponding to the 17 th character. Thus the target text "if a=100, b=20, please calculate the value of a/b" reads aloud as [ if a equals one hundred b equals twenty please calculate the value of a divided by b ], rather than mistakenly reciting it as [ if a equals one hundred b equals twenty please calculate the value of a or b ].

It should be noted that, in the above embodiments, the order of obtaining the audio information from the audio information base and determining the position information by the mobile phone is not limited. That is, the mobile phone may determine the location information first and then acquire the audio data, or may acquire the audio data first and then determine the location information.

In summary, according to the method for playing the text with sound provided by the embodiment, the audio information corresponding to the non-text information can be played in the process of playing the target text with sound, so that the non-text information in the text is fully expressed, the expression effect of the terminal equipment on the text information is improved, and the user experience is improved.

Corresponding to the method for playing text in the foregoing embodiment, the present embodiment further provides a device for playing text in a voiced manner. For convenience of explanation, only portions relevant to the embodiments of the present application are shown.

Referring to fig. 10, the audio playback apparatus for text provided in the present embodiment includes an identification unit 1001, a determination unit 1002, and a playback control unit 1003.

And a recognition unit 1001 for recognizing non-text information in the target text.

A determining unit 1002, configured to determine audio information corresponding to the non-text information.

And a playing control unit 1003, configured to play the target text in a sound manner according to the audio information corresponding to the non-text information.

Optionally, the non-text information includes emoji, typesetting control symbols, punctuation, math symbols, annotating symbols, or character's characteristic font style.

Optionally, the determining unit 1002 is further configured to determine, according to the identification information of the non-text information, audio information corresponding to the non-text information from a preset audio information base.

Optionally, when the target text includes text and non-text information, the determining unit 1002 is further configured to determine, according to semantics of the target text, an application type of the first symbol, where the first symbol includes audio information corresponding to at least two application types, and determine, according to the application type of the first symbol, the audio information of the first symbol.

Optionally, when the target text includes text and the non-text information, the play control unit 1003 is further configured to play, if the non-text information is an emoji symbol, a typesetting control symbol, a punctuation mark, or a math symbol, the audio information corresponding to the text and the audio information corresponding to the non-text information in sequence according to the arrangement order of the text and the non-text information in the target text.

Optionally, when the target text includes text and the non-text information, the play control unit 1003 is further configured to identify an annotation text corresponding to the annotation symbol if the non-text information is the annotation symbol, and sequentially play audio information corresponding to the text and audio information corresponding to the annotation text according to the arrangement sequence of the text and the annotation symbol in the target text.

Optionally, when the target text includes text and the non-text information, the play control unit 1003 is further configured to identify an annotation text corresponding to the annotation symbol if the non-text information is the annotation symbol, and play audio information corresponding to the annotation text after playing audio information of all text of a sentence in which the non-text information is located.

Optionally, when the target text includes text and non-text information, the play control unit 1003 is further configured to play, if the non-text information is a characteristic font style, audio information corresponding to the characteristic font style as background sound of the audio information corresponding to the text having the characteristic font style while playing the audio information corresponding to the text having the characteristic font style.

The present embodiment also provides a terminal device, where the terminal device includes a speaker, a memory, a processor, and a computer program stored in the memory and capable of running on the processor, and the processor implements the method for playing text in sound provided in the above embodiment when executing the computer program. The terminal device may be as shown in fig. 1, for example.

The present embodiment also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements steps of the respective method embodiments described above.

The computer readable medium can include at least any entity or device capable of carrying computer program code to a font-audio playback device, a recording medium, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

Embodiments of the present application also provide a computer program product comprising instructions. The computer program product, when run on a computer or processor, causes the computer or processor to perform one or more steps of any of the methods described above.

The embodiment of the application provides a chip system, which comprises a processor, wherein the processor is coupled with a memory, and the processor executes a computer program stored in the memory to realize the sound playing method of the text provided by the embodiment of the application. In this embodiment, the chip system may be a single chip, or a chip module formed by a plurality of chips.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

Those of ordinary skill in the art will appreciate that implementing all or part of the above-described method embodiments may be accomplished by a computer program to instruct related hardware, the program may be stored in a computer readable storage medium, and the program may include the above-described method embodiments when executed. The storage medium includes various media capable of storing program codes such as ROM or random access memory RAM.

It should be noted that the above description is only a specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for playing text with sound, comprising:

Identify non-textual information in the target text;

Determining audio information corresponding to the non-text information;

Playing the target text audibly according to the audio information corresponding to the non-text information;

Among them, if the non-text information is a characteristic font style of text, then while playing the audio information corresponding to the text with the characteristic font style, the audio information corresponding to the characteristic font style is played as the background sound of the audio information corresponding to the text with the characteristic font style.

2. The method according to claim 1 is characterized in that the non-text information includes emoticons, typesetting control symbols, punctuation marks, mathematical symbols, annotation symbols, comment symbols or characteristic font styles of text.

3. The method according to claim 1, wherein when the target text includes text and the non-text information, determining the audio information corresponding to the non-text information comprises:

If the non-text information is a first symbol, then determining the application type of the first symbol according to the semantics of the target text; the first symbol includes at least two audio information corresponding to the application type;

Audio information of the first symbol is determined according to the application type of the first symbol.

4. The method according to claim 2, characterized in that when the target text includes text and the non-text information, playing the target text audibly according to the audio information corresponding to the non-text information comprises:

If the non-text information is the emoticon, the typesetting control symbol, the punctuation mark, the mathematical symbol or the annotation symbol, the audio information corresponding to the text and the audio information corresponding to the non-text information are played in sequence according to the arrangement order of the text and the non-text information in the target text.

5. The method according to claim 2, characterized in that when the target text includes text and the non-text information, playing the target text audibly according to the audio information corresponding to the non-text information comprises:

According to the arrangement order of the characters and the annotation symbols in the target text, the audio information corresponding to the characters and the audio information corresponding to the annotated characters are played in sequence.

6. The method according to claim 2, characterized in that when the target text includes text and the non-text information, playing the target text audibly according to the audio information corresponding to the non-text information comprises:

After playing the audio information of all the texts in the sentence where the non-text information is located, play the audio information corresponding to the annotation text.

7. The method according to any one of claims 1 to 6, characterized in that the step of determining the audio information corresponding to the non-text information comprises:

According to the identification information of the non-text information, the audio information corresponding to the non-text information is determined from a preset audio information library.

8. A text audio playback device, characterized in that the device comprises:

Recognition unit, used to recognize non-text information in the target text;

A determination unit, configured to determine audio information corresponding to the non-text information;

A playback control unit, configured to play the target text audibly according to the audio information corresponding to the non-text information;

9. A terminal device, comprising a speaker, a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium storing a computer program, wherein the computer program implements the method according to any one of claims 1 to 7 when executed by a processor.