[go: up one dir, main page]

WO2023197809A1 - High-frequency audio signal encoding and decoding method and related apparatuses - Google Patents

High-frequency audio signal encoding and decoding method and related apparatuses Download PDF

Info

Publication number
WO2023197809A1
WO2023197809A1 PCT/CN2023/081461 CN2023081461W WO2023197809A1 WO 2023197809 A1 WO2023197809 A1 WO 2023197809A1 CN 2023081461 W CN2023081461 W CN 2023081461W WO 2023197809 A1 WO2023197809 A1 WO 2023197809A1
Authority
WO
WIPO (PCT)
Prior art keywords
encoding
frequency
coding
audio signal
signal frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/081461
Other languages
French (fr)
Chinese (zh)
Inventor
梁俊斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Publication of WO2023197809A1 publication Critical patent/WO2023197809A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Definitions

  • the present application relates to the field of computer technology, and in particular to the encoding and decoding of high-frequency audio signals.
  • Audio coding and decoding occupies an important position in modern communication systems. By compressing and encoding audio signals, the network bandwidth pressure of audio signals in network transmission can be reduced, and the storage and transmission costs of audio signals can be saved.
  • the high-frequency component of the audio signal (i.e., the high-frequency audio signal) has rich information and has a greater impact on the sound quality.
  • the loss of the high-frequency audio signal will lead to problems such as muffled sound, reduced intelligibility, and reduced fidelity.
  • the low-frequency component of the audio signal i.e., low-frequency audio signal
  • it has the characteristics of low energy proportion, low harmonic components, and low human ear resolution, so it has a large coding compression space.
  • the current high-frequency audio signal encoding methods either sacrifice encoding quality in order to reduce the number of encoding bits, or increase the number of encoding bits in order to improve encoding quality. It is difficult to achieve satisfactory results in both the number of encoding bits and encoding quality.
  • the present application provides a coding and decoding method for high-frequency audio signals and related devices.
  • a coding method with a small number of coding bits can be selected to achieve a balance between the number of coding bits and the coding quality.
  • the quality is relatively satisfactory, with a lower number of encoding bits and high-quality audio.
  • inventions of the present application provide a method for encoding high-frequency audio signals.
  • the method includes:
  • the coding method whose coding error is within the error preset interval is determined from the multiple coding methods as the target coding method.
  • the coding error of the coding method is the encoding method of the original high-frequency audio signal frame.
  • the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method is sent to the receiving end.
  • the high-frequency code stream has an encoding identifier, and the encoding identifier is used to indicate that the encoding obtained by The encoding method used for high-frequency code streams.
  • embodiments of the present application provide another method for decoding high-frequency audio signals.
  • the method includes:
  • the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream;
  • the high-frequency code stream is decoded according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.
  • inventions of the present application provide a device for encoding high-frequency audio signals.
  • the device includes an acquisition unit, Determine unit and send unit:
  • the acquisition unit is used to acquire multiple encoding methods and acquire original high-frequency audio signal frames decomposed from original audio signal frames;
  • the acquisition unit is also used to acquire the priorities corresponding to the multiple encoding methods. According to the order of the priorities from high to low, the number of encoding bits of the encoding methods increases;
  • the determination unit is configured to determine, from the plurality of coding methods, a coding method with a coding error within a preset error interval as a target coding method according to the priority of the coding method.
  • the coding error of the coding method is determined by using the coding method. It is generated by encoding the original high-frequency audio signal frame;
  • the sending unit is configured to send a high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to the receiving end.
  • the high-frequency code stream has a coding identifier, and the coding identifier Used to indicate the encoding method used to obtain the high-frequency code stream.
  • inventions of the present application provide another device for decoding high-frequency audio signals.
  • the device includes a receiving unit, an analysis unit, and a decoding unit:
  • the receiving unit is used to receive the high-frequency code stream sent by the transmitting end.
  • the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream;
  • the analysis unit is used to analyze and obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier;
  • the decoding unit is configured to decode the high-frequency code stream according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.
  • embodiments of the present application provide a computer device, which includes a processor and a memory:
  • the memory is used to store a computer program and transmit the computer program to the processor
  • the processor is configured to execute the method described in any of the foregoing aspects according to instructions in the computer program.
  • embodiments of the present application provide a computer-readable storage medium used to store a computer program, and the computer program is used to execute the method described in any of the foregoing aspects.
  • embodiments of the present application provide a computer program product, including a computer program, which implements the method described in any of the foregoing aspects when executed by a processor.
  • this application proposes a high-frequency audio signal encoding and decoding method based on the mixing of multiple encoding methods based on coding error judgment for the original high-frequency audio signal.
  • For the original audio signal For each original audio signal frame in, obtain multiple encoding methods and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame.
  • the encoding methods have corresponding priorities, in order from high to low priority,
  • the encoding method has an increasing number of coded bits.
  • the coding method with coding error within the error preset interval is determined from multiple coding methods as the target coding method.
  • the coding error of the coding method is generated by encoding the original high-frequency audio signal frame using the coding method. , so that the coding error can be used as the criterion, the target coding method can be determined with the optimal number of coding bits as the goal, and the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target coding method is sent to the receiving end. Therefore, when the encoding quality permits, the encoding method with a small number of encoding bits is selected, which reduces the bandwidth of audio signal transmission.
  • the coding identifier is used to indicate the encoding method used to obtain the high-frequency code stream, so that the decoding end can determine which encoding to use based on the coding identifier.
  • Method to decode the received high-frequency code stream It can be seen that this application can choose to use a coding method with a small number of coded bits when the coding quality permits, to achieve relatively satisfactory results in both the number of coded bits and coding quality, with a lower number of coded bits and high-quality audio. .
  • Figure 1 is an application scenario architecture diagram of a high-frequency audio signal encoding and decoding method provided by an embodiment of the present application
  • Figure 2 is a flow chart of a high-frequency audio signal encoding method provided by an embodiment of the present application
  • Figure 3 is a coding flow chart of an SBR method provided by an embodiment of the present application.
  • Figure 4 is a decoding flow chart of an SBR method provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of a low-frequency audio signal copy and correction of a high-frequency copy signal provided by an embodiment of the present application
  • Figure 6 is a coding flow chart of a CELP coding method provided by an embodiment of the present application.
  • Figure 7 is a decoding flow chart of a CELP encoding method provided by an embodiment of the present application.
  • Figure 8 is a flow chart of a coding error determination method provided by an embodiment of the present application.
  • Figure 9 is an acoustic equal loudness curve measured by the International Acoustic Standards Organization provided by the embodiment of the present application.
  • Figure 10 is a calculated auditory perception weighting coefficient diagram provided by an embodiment of the present application.
  • Figure 11 is a flow chart of a method for decoding high-frequency audio signals provided by an embodiment of the present application.
  • Figure 12 is an overall implementation architecture diagram of a high-frequency audio signal encoding and decoding method provided by an embodiment of the present application
  • Figure 13 is a structural diagram of a high-frequency audio signal encoding device provided by an embodiment of the present application.
  • Figure 14 is a structural diagram of a high-frequency audio signal decoding device provided by an embodiment of the present application.
  • Figure 15 is a structural diagram of a terminal provided by an embodiment of the present application.
  • Figure 16 is a structural diagram of a server provided by an embodiment of the present application.
  • Audio codec plays an important role in modern communication systems. For example, in a voice call application, the audio signal is collected through a microphone, and the analog audio signal is converted into a digital audio signal through an analog-to-digital conversion circuit. The digital audio signal is compressed by an encoder, and then packaged and sent to the receiver according to the communication network transmission format and protocol. After receiving the data packet, the receiving end unpacks and outputs the encoded code stream, and then regenerates the audio digital signal through the decoder. Finally, the audio digital signal is played through the speaker. Audio coding and decoding can effectively reduce the bandwidth of audio signal transmission, play a decisive role in saving audio signal storage and transmission costs, and ensuring the integrity of audio signals during communication network transmission.
  • the high-frequency audio signal of the audio signal has richer information, which has a greater impact on the sound quality.
  • the high-frequency audio signal has a lower proportion of energy, lower harmonic components, and lower resolution of the human ear. Low-level characteristics, so it has a large coding compression space.
  • the high-frequency audio signal coding methods provided by related technologies either sacrifice coding quality in order to reduce the number of coding bits (such as the blind expansion method), or increase the number of coding bits in order to improve the coding quality (such as code-excited linear prediction (Code-Excitation Linear Prediction)).
  • Excited Linear Prediction (CELP) coding method it is difficult to achieve satisfactory results in terms of the number of coding bits and coding quality.
  • embodiments of the present application provide a encoding and decoding method for high-frequency audio signals. It is a high-frequency audio signal coding and decoding method based on the mixing of multiple coding methods based on coding error judgment.
  • the coding quality allows, the coding method with a small number of coding bits can be selected to achieve the best performance in terms of the number of coding bits and coding quality. All achieve relatively satisfactory results, with lower encoding bit count and high-quality audio.
  • the audio signal can be speech, music, etc.
  • Figure 1 shows an application scenario architecture diagram of a high-frequency audio signal encoding and decoding method.
  • This application scenario may include a sending end 101 and a receiving end 102.
  • the sending end 101 and the receiving end 102 can both be terminals, or the sending end 101 can be a terminal, the receiving end 102 can be a server, and so on.
  • the terminal may be, for example, a mobile phone, a computer, an intelligent voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, etc., but is not limited thereto.
  • the server can be, for example, an independent server, a server in a cluster, or a cloud server.
  • the sending end 101 is a terminal and the receiving end 102 is a server
  • the terminal and the server can be connected through wired or wireless means.
  • the embodiment of this application will be introduced using a voice call scenario as an example.
  • both the sending end 101 and the receiving end 102 can be terminals, and the terminal is a mobile phone.
  • the above-mentioned terminals and servers are computer equipment.
  • the sending end 101 can collect the original audio signal through the corresponding microphone (in this case, the audio signal can be the voice of the corresponding user of the sending end 101).
  • the sending end 101 can To encode the original audio signal, the embodiment of this application mainly introduces the encoding of the original high-frequency audio signal in the original audio signal.
  • the sending end 101 can obtain multiple encoding methods, and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame.
  • the encoding methods have corresponding priorities.
  • the priority of the method is used to indicate the priority of encoding using this encoding method. Normally, in order to reduce the bandwidth of audio signal transmission as much as possible, the number of encoding bits of the encoding method increases in order from high to low priority.
  • the sending end 101 determines the coding method with coding error within the preset error interval from multiple coding methods as the target coding method according to the priority of the coding method.
  • the coding error of the coding method is calculated by using the coding method to encode the original high-frequency audio signal frame.
  • the target encoding method can be determined with the encoding error as the criterion and the optimal number of encoding bits as the goal.
  • the transmitting end 101 sends the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to the receiving end 102, so that when the encoding quality allows, the encoding method with a small number of encoding bits is selected to reduce the cost.
  • the bandwidth of audio signal transmission is used to reduce the cost.
  • the encoding identifier is used to indicate the encoding method used to obtain the high-frequency code stream, so that the receiving end 102 can determine which encoding method to use to decode the received high-frequency code stream based on the encoding identifier.
  • the receiving end 102 determines the decoding method corresponding to the identified encoding method according to the encoding identification, thereby using the corresponding decoding method to decode the high-frequency code stream to obtain the high-frequency audio signal frame, and play it through the corresponding speaker.
  • embodiments of this application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, assisted driving, vehicle-mounted scenarios, etc., and are specifically applied to voice calls, video conferencing, in-person communication in these scenarios. Computer interaction scenes and so on.
  • Figure 2 shows a flow chart of a method for encoding high-frequency audio signals. This embodiment can be executed by a computer device. The method includes:
  • the original audio signal may include high-frequency audio signals. Based on the characteristics of high-frequency audio signals, high-frequency audio signals have a large coding space.
  • embodiments of the present application provide a high-frequency audio signal encoding method that mixes multiple encoding methods based on coding error determination. Specifically, for each original audio signal frame in the original audio signal, the transmitting end can obtain multiple encoding methods, and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame.
  • the multiple encoding methods available for selection can be any number of existing encoding methods, such as audio super resolution (Speech Super Resolution, SSR) method, frequency band replication (Spectral Band Replication) , SBR) method, CELP coding method and other methods in combination.
  • SSR Sound Super Resolution
  • SBR frequency band replication
  • CELP CELP coding method
  • the SSR method is a blind expansion method
  • the SBR method and CELP coding method are a non-blind expansion method.
  • various coding methods can also include other
  • the encoding method is not limited in the embodiments of this application.
  • the embodiments of this application mainly introduce multiple coding methods including SSR method, SBR method and CELP coding method as examples.
  • the SSR method is a blind frequency band expansion method, which can also be called a blind expansion method.
  • the SSR method does not send coding parameters to the receiving end when encoding, so this method does not occupy the number of coding bits.
  • decoding at the receiving end there is a certain correlation between the low-frequency audio signal and the high-frequency audio signal.
  • the high-frequency audio signal is mapped from the low-frequency audio signal. This method aims to reconstruct a high-resolution audio signal with a lower-resolution audio signal as input.
  • the SSR method can predict the characteristic information of high-frequency audio signals based on the characteristic information of the input low-frequency audio signals based on the neural network model, thereby fictionalizing the high-frequency audio signals.
  • the SBR method is a non-blind expansion method that requires reconstructing high-frequency audio signals based on a small number of coding parameters transmitted by the transmitter. Since high-frequency reconstruction is supported by auxiliary information, this method has better reconstruction quality.
  • Figure 3 is the encoding flow chart of the SBR mode
  • Figure 4 is the decoding flow chart of the SBR mode.
  • Figure 3 illustrates the encoding method of Advanced Audio Coding (AAC)+SBR
  • Figure 4 illustrates the corresponding AAC+SBR decoding method.
  • the original audio signal is first decomposed into a high-frequency audio signal and a low-frequency audio signal.
  • the high-frequency audio signal is obtained through a Quadrature Mirror Filter (QMF) filter bank, and the high-frequency audio signal is obtained through a 2:1 downsampler.
  • the low-frequency audio signal uses an AAC encoder to generate the encoding parameters of the low-frequency audio signal, while the high-frequency audio signal is encoded based on the SBR encoder.
  • the low-frequency audio signal is copied to the high-frequency band to obtain the high-frequency copy signal, and then the high-frequency replica signal is obtained according to the envelope.
  • the envelope features are extracted, used to correct the high-frequency replica signal (the process is shown in Figure 5), and the encoding parameters are extracted and sent to the receiving end.
  • FIG. 5 is a schematic diagram of the high-frequency energy curve after directly copying the low-frequency audio signal to the high-frequency band to obtain the high-frequency copy signal 501, and the high-frequency energy is different from the actual high-frequency energy. There is a slight difference.
  • the envelope feature can more accurately reflect the high-frequency energy. Therefore, the high-frequency copy signal is corrected based on the envelope feature obtained by envelope extraction to obtain the high-frequency reconstructed signal 502.
  • the high-frequency energy curve obtained at this time can be See (b) in Figure 5.
  • the encoded high-frequency audio signal and low-frequency audio signal obtained through the above process can be combined by a bit stream multiplexer to obtain the corresponding encoded code stream.
  • the SBR method only needs to transmit limited parameters to the receiving end.
  • the decoding process at the receiving end shown in Figure 4 uses a code stream decomposer to decompose the coded code stream into coded low-frequency audio signals and coded high-frequency audio signals.
  • the encoded low-frequency audio signal is decoded.
  • the encoded low-frequency audio signal is generated by the AAC decoder, which is passed through the QMF analysis filter and then participates in high-frequency reconstruction.
  • the high-frequency reconstruction process is to decode the SBR decoder to obtain the required encoding parameters, copy the low-frequency audio signal to the high-frequency band, and obtain the high-frequency replica signal.
  • the envelope features obtained by envelope extraction correct the high-frequency copy signal to generate a high-frequency reconstructed signal.
  • the high- and low-frequency signals are aligned and combined into a full-band audio signal through a comprehensive filter.
  • the CELP coding method is an effective speech compression coding method with a medium and low number of coding bits. It uses the codebook as the excitation source. It has the advantages of low code rate, high synthetic speech quality, and strong anti-noise ability. It operates at a code rate of 4.8 to 16kbps. It has been widely used.
  • Figures 6 and 7 are respectively the encoding flow chart of the CELP encoding method and the decoding flow chart of the CELP encoding method.
  • the receiving end parses all coding parameters from the received data packet through the decoder, and generates a fixed codebook excitation signal based on the fixed codebook and fixed codebook gain, and based on the adaptive codebook
  • the adaptive codebook gain and the adaptive codebook gain generate an adaptive codebook excitation signal.
  • the sum of the two excitations is filtered and post-processed by a synthesis filter to obtain the final audio signal.
  • the filter coefficients of the synthesis filter are obtained by interpolating the LSP parameters.
  • the coding method has a corresponding priority.
  • the priority of the coding method is used to indicate the priority of using this coding method for encoding. Normally, in order to reduce the bandwidth of audio signal transmission as much as possible, according to the priority, from In order from high to low, the number of encoding bits of the encoding method increases, that is, the bandwidth occupied by the transmission or the compressed storage space increases.
  • the encoding error of the encoding method is the encoding method of the original high-frequency audio. Signal frames are encoded.
  • the high-frequency audio signal frame is completely predicted based on the characteristic information of the low-frequency audio signal frame.
  • the high-frequency and low-frequency audio signal frames have certain correlation, but there is no absolute correspondence, so there is a large error between the high-frequency audio signal frame obtained by high-frequency reconstruction and the original high-frequency audio signal frame; the SBR method can only ensure envelope matching, cannot further reduce the error, and takes up Less number of coding bits; high-frequency reconstruction based on CELP coding ensures that the reconstructed high-frequency audio signal frame has a consistent envelope with the original high-frequency audio signal frame through LSP parameters, and at the same time, codebook excitation is used to further reduce the reconstructed high-frequency audio signal frame.
  • the embodiments of the present application are based on the coding error judgment and the priority of the coding method (priority level reflects the number of encoding bits), select an appropriate encoding method to encode the current original high-frequency audio signal frame. Specifically, based on the priority of the encoding method, the sending end determines the encoding method with the encoding error within the error preset interval from multiple encoding methods as the target encoding method.
  • the encoding error of the encoding method is the encoding method of the original high-frequency audio signal. Frames are encoded.
  • the coding error is generated by encoding the original high-frequency audio signal frame using a coding method, and can be the difference between the reconstructed high-frequency audio signal frame (ie, the high-frequency reconstructed signal frame) and the original high-frequency audio signal frame.
  • Error can reflect the encoding quality. The smaller the encoding error, the higher the encoding quality. Considering the priority and coding quality comprehensively, if the coding quality allows, you can choose to use a coding method with a small number of coded bits to achieve a satisfactory effect in terms of the number of coded bits and coding quality, with a lower number of coded bits. and quality audio.
  • the error preset interval may be greater than the error threshold (Thrd). In this case, the encoding error is less than or equal to Thrd. It can be considered that the encoding error is within the error preset interval. Otherwise, the encoding error exceeds the error preset interval.
  • the embodiments of this application provide multiple ways to implement S203.
  • the coding error of each coding method in multiple coding methods can be determined separately. If there is a coding method in which the coding error is within the error preset interval, the coding error in the coding method is within the error preset interval.
  • the encoding method with the highest priority is determined as the target encoding method. Taking multiple coding methods including SSR method, SBR method and CELP coding method as an example, the order of priority from high to low is SSR method, SBR method and CELP coding method. Determine the encoding of SSR method, SBR method and CELP coding method respectively.
  • the SSR mode will be used as the target encoding mode.
  • the SSR mode is directly used as the target coding mode, or if there are only coding errors in the SBR mode that are within the error preset interval, Then directly use the SBR method as the target encoding method.
  • the coding method can be selected according to the actual situation. For example, for scenes with higher quality requirements, the encoding method with the smallest encoding error is used as the target encoding method to ensure encoding quality. For another example, for scenarios with higher bandwidth requirements, the encoding method with the highest priority is used as the target encoding method.
  • the order of priority from high to low is SSR method, SBR method and CELP coding method. Determine the SSR method, SBR method and CELP coding method respectively. Coding error. The coding error of any coding method exceeds the preset error interval.
  • the CELP coding method is used as the target coding method; for In scenarios with high bandwidth requirements, since the SSR method has the highest priority, the SSR method is used as the target encoding method.
  • the encoding error of the currently selected encoding method is within the error preset interval. within, stop trying and select the currently selected encoding method as the target encoding method for encoding.
  • the sending end selects the undetermined encoding method from multiple encoding methods in order from high to low priority, and determines the encoding error of the undetermined encoding method. If the encoding error of the undetermined encoding method is within the error preset interval, then Determine the pending encoding method as the target encoding method and stop selecting the pending encoding method.
  • the pending encoding method is the last encoding method among multiple encoding methods (i.e., the encoding method with the lowest priority), it means that the encoding errors of the previously tried encoding methods are beyond the error preset interval, then for the last encoding method It is possible to directly use the last encoding method as the target encoding method without performing the step of determining the encoding error of the encoding method.
  • SSR method in order of priority from high to low are SSR method, SBR method and CELP coding method
  • SSR method in order of priority from high to low are SSR method, SBR method and CELP coding method
  • the SSR method is first selected as the pending coding method, and the SSR The coding error of the method. If the coding error of the SSR method is within the error preset interval, the SSR method is determined as the target encoding method; if the coding error of the SSR method exceeds the error preset interval, the SBR method continues to be selected as the pending encoding method and determined Coding error of the SBR method.
  • the SBR method is determined as the target encoding method; if the coding error of the SBR method exceeds the error preset interval, the CELP encoding method is directly used as the target encoding method. .
  • the high-frequency code stream has an encoding identifier, and the encoding identifier is used to indicate that the encoding has been obtained.
  • the encoding method used by the high-frequency code stream is not limited to:
  • the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method can be sent to the receiving end.
  • the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate that the high-frequency code stream obtained by encoding is
  • the encoding method used for the video code stream so that the receiving end can know which encoding method corresponds to the decoding method for decoding.
  • the encoding identifier can be used to uniquely identify the encoding method, and the encoding identifier can be in various possible forms, such as numbers, symbols, letters, etc.
  • the method includes:
  • a neural network that can predict the high-frequency reconstructed signal frame based on the sample low-frequency audio signal frame is obtained.
  • network model When used, the original low-frequency audio signal frame is used as the input of the neural network model, the low-frequency features are extracted through the neural network model, and then the high-frequency reconstructed signal frame is predicted based on the low-frequency features. In some cases, the low-frequency features of the original low-frequency audio signal frame can also be extracted through other methods, and the low-frequency features can be used as the input of the neural network model to output a high-frequency reconstructed signal frame.
  • the neural network model may be a convolutional neural network (Convolutional Neural Networks, CNN), a long short-term memory network (Long Short-Term Memory, LSTM), etc., which are not limited in the embodiments of the present application.
  • S802 can be implemented by copying the original low-frequency audio signal frame to the high-frequency band to obtain a high-frequency copy signal frame. Directly copy the low-frequency audio signal frame to the high-frequency band.
  • the high-frequency energy of the high-frequency copied signal frame is slightly different from the actual high-frequency energy.
  • the envelope feature can more accurately reflect the high-frequency energy, so the original high-frequency energy can be extracted.
  • the envelope characteristics of the high-frequency audio signal frame are then used to correct the high-frequency copied signal frame to obtain the high-frequency reconstructed signal frame.
  • any encoding method is a code-excited linear prediction method
  • the implementation of S802 can be to obtain the encoding parameters from the high-frequency code stream and obtain the pitch period (pitch) of the original low-frequency audio signal frame, and then proceed based on the encoding parameters and pitch period.
  • High-frequency reconstruction is performed to obtain a high-frequency reconstructed signal frame.
  • the coding parameters may include LSP parameters, codebook data (such as fixed codebook and adaptive codebook), and gain data (such as fixed codebook gain and adaptive codebook gain).
  • error analysis can be performed based on the high-frequency reconstructed signal frame and the original high-frequency audio signal frame to obtain the corresponding coding error.
  • the coding error can reflect the difference between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame.
  • the coding error is used to measure the coding quality of the coding method.
  • S803 can be implemented by calculating the difference signal between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame, and then using the difference signal to determine Coding errors. If the high-frequency reconstructed signal frame is represented by S’ and the original high-frequency audio signal frame is represented by S, then S’ and S are subtracted to obtain a difference signal, and the difference signal can be represented as Err.
  • the difference signal can already reflect the error between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame, in a possible implementation, the difference signal can be used as the coding error to more accurately reflect the error. Encoding error in encoding method.
  • the error reflected in the difference signal is the error of the signal itself, and the signal usually needs to be played to the user, and the error at the user's auditory perception level may be different from the error of the signal itself. Therefore, in another Among possible implementation methods, psychoacoustic perception analysis method can be used for the error signal, and the error size at the auditory perception level can be quantified through psychoacoustic perception. Based on this, when calculating the coding error, the difference energy can be obtained by calculating the auditory perception weighted energy of the difference signal, and the auditory perception weighted energy of the original high-frequency audio signal frame can be calculated to obtain the original energy. The difference energy and the original energy can be calculated The ratio of is taken as coding error. Among them, the difference energy and the original energy are the auditory perception weighted energy. If the difference energy is expressed as EP_err(i) and the original energy is expressed as EP_s(i), then the coding error calculation formula can be:
  • w(i) is the coding error
  • EP_err(i) is the difference energy
  • EP_s(i) is the original energy.
  • Thrd the error preset interval
  • w(i)>Thrd it means that the encoding error exceeds the error preset interval. On the contrary, it is within the error preset interval.
  • the coding error can be measured from the aspect of auditory perception, thereby ensuring the coding quality at the auditory perception level.
  • Figure 9 is an acoustic equal loudness curve measured by the International Acoustic Standards Organization provided by the embodiment of the present application.
  • the acoustic equal loudness curve is a curve describing the relationship between sound pressure loudness and frequency under equal loudness conditions, and is one of the important auditory characteristics. That is, what sound pressure level intensity does the audio signal at different frequencies need to reach in order to obtain consistent hearing loudness for the user. In order to illustrate the meaning of this curve, let’s take an example below, such as any equal loudness curve in Figure 9.
  • an analysis window of 20 ms is usually used for one frame (consistent with the encoder frame definition).
  • the window function can choose Hanning window or Hanning window. Bright windows.
  • each frequency point k is multiplied by different auditory perception weighting coefficients and then accumulated to obtain the auditory perception weighted energy value of the audio signal of this frame.
  • the calculation formula is as follows:
  • EP(i) is the auditory perception weighted energy of the i-th frame audio signal
  • i is the frame number
  • k is the frequency point number
  • cof(k) is the auditory perception weighting coefficient of the k-th frequency point.
  • the calculated EP(i) is expressed as the original energy EP_s(i); when the i-th frame audio signal is the corresponding difference signal, the calculated The obtained EP(i) is expressed as the difference energy EP_err(i).
  • the embodiment of the present application uses psychoacoustic equal loudness curve data based on the BS3383 standard to calculate it.
  • freq represents the frequency point
  • cof(freq) is equivalent to the auditory perception weighting coefficient of the kth frequency point
  • loud represents the loudness value of the frequency point freq.
  • ff, af, bf, cf correspond to the data in the equal loudness curve data table disclosed in the BS3383 standard, which can be obtained through the equal loudness curve data table query
  • j is the number in the equal loudness curve data table
  • freq is required Calculate the frequency point of the loudness value loud.
  • the loudness value loud is calculated by using the linear interpolation method to interpolate the data in the equal loudness curve data table.
  • the freq of the loudness value calculated through the above formula is usually the frequency point corresponding to the number between j-1 and j.
  • the auditory perception weighting coefficient diagram calculated based on the above formula can be seen in Figure 10, which reflects the auditory perception weighting coefficient corresponding to different frequency points.
  • Embodiments of the present application also provide a method for decoding high-frequency audio signals. This method is introduced from the perspective of the receiving end. See Figure 11. The method includes:
  • the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream.
  • S1102. Analyze and obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier.
  • the receiving end After the receiving end receives the high-frequency code stream, it can parse the high-frequency code stream and the corresponding encoding identifier, and then decode the high-frequency code stream according to the decoding method corresponding to the encoding method indicated by the encoding identifier to obtain the high-frequency audio signal frame. .
  • this application proposes a high-frequency audio signal encoding and decoding method based on the mixing of multiple encoding methods based on coding error judgment for the original high-frequency audio signal.
  • For the original audio signal For each original audio signal frame in, obtain the original high-frequency audio signal frame decomposed from the original audio signal frame and multiple encoding methods.
  • the encoding methods have corresponding priorities.
  • the priority of the encoding method is used to indicate the use of this encoding.
  • the priority of encoding method is usually, in order to reduce the bandwidth of audio signal transmission as much as possible, the number of encoding bits of the encoding method increases in order from high to low priority.
  • the coding method with coding error within the error preset interval is determined from multiple coding methods as the target coding method.
  • the coding error of the coding method is generated by encoding the original high-frequency audio signal frame using the coding method.
  • the target coding method can be determined with the optimal number of coding bits as the goal, and the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target coding method is sent to the receiving end. Therefore, when the encoding quality permits, the encoding method with a small number of encoding bits is selected, which reduces the bandwidth of audio signal transmission.
  • the encoding identifier is used to indicate the encoding method used to encode the high-frequency code stream, so that the decoding end can determine which encoding method to use to decode the received high-frequency code stream based on the encoding identifier. It can be seen that this application can choose to use a coding method with a small number of coded bits when the coding quality permits, to achieve relatively satisfactory results in both the number of coded bits and coding quality, with a lower number of coded bits and high-quality audio. .
  • This application also provides a coding and decoding method for high-frequency audio signals, which is introduced from the perspective of the overall architecture of the transmitter and receiver.
  • the embodiments of the present application use multiple coding methods including SSR method, SBR method and CELP coding method, which has the advantages of The priorities from high to low are SSR mode, SBR mode and CELP encoding mode as an example.
  • the overall implementation architecture of encoding and decoding high-frequency audio signals can be seen in Figure 12.
  • the original high-frequency audio signal frame and the original low-frequency audio signal frame are input (see 1201 in Figure 12).
  • the original high-frequency audio signal frame and the original low-frequency audio signal frame are the original audio signal frames after high and low frequency decomposition (for example, through QMF Filter bank decomposition), the original low-frequency audio signal frame can be used for subsequent high-frequency reconstruction.
  • the high-frequency audio coding process first try to encode through SSR to obtain the high-frequency code stream (see 1202 in Figure 12), and then perform high-frequency reconstruction (see 1203 in Figure 12), and reconstruct it based on the high frequency
  • the high-frequency reconstructed signal frame and the original high-frequency audio signal frame are used to determine whether the coding error is within the error preset interval (see 1204 in Figure 12).
  • the high-frequency code stream and encoding identifier are obtained through analysis (see 1210 in Figure 12), and the high-frequency code stream is decoded using the decoding method corresponding to the encoding method indicated by the encoding identifier (see Figure 12) (shown as 1211 in Figure 12), after the above process, the high-frequency audio signal frame (see 1212 in Figure 12) is obtained by decoding.
  • the high-frequency audio signal encoding device 1300 includes an acquisition unit 1301, a determination unit 1302 and a sending unit 1303:
  • the acquisition unit 1301 is used to acquire multiple encoding methods and acquire original high-frequency audio signal frames decomposed from original audio signal frames;
  • the obtaining unit 1301 is also used to obtain the priorities corresponding to the multiple coding methods. According to the order of the priorities from high to low, the number of coded bits of the coding method increases;
  • the determination unit 1302 is configured to determine, from the plurality of coding methods, a coding method with a coding error within a preset error interval as a target coding method according to the priority of the coding method.
  • the coding error of the coding method is determined by using the coding method.
  • the original high-frequency audio signal frame is generated by encoding;
  • the sending unit 1303 is configured to send a high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to the receiving end.
  • the high-frequency code stream has a coding identifier, and the coding The identifier is used to indicate the encoding method used to obtain the high-frequency code stream.
  • the determining unit 1302 is specifically used to:
  • the undetermined encoding method is determined as the target encoding method, and the continued selection of the undetermined encoding method is stopped.
  • the determining unit 1302 is specifically used to:
  • the coding method with the highest priority is determined as the target coding method.
  • the device further includes a reconstruction unit and an error analysis unit:
  • the acquisition unit 1301 is also used to acquire the original low-frequency audio signal frame decomposed from the original audio signal frame;
  • the reconstruction unit is configured to perform high-frequency reconstruction using any of the encoding methods according to the original low-frequency audio signal frame to obtain a high-frequency reconstructed signal frame;
  • the error analysis unit is configured to perform error analysis based on the high-frequency reconstructed signal frame and the original high-frequency audio signal frame to obtain the corresponding coding error.
  • the error analysis unit is specifically used to:
  • the coding error is determined using the difference signal.
  • the error analysis unit is specifically used to:
  • the ratio of the difference energy to the original energy is used as the coding error.
  • the reconstruction unit is specifically used to:
  • the high-frequency reconstructed signal frame is obtained by predicting through the neural network model.
  • the reconstruction unit is specifically used to:
  • the high-frequency replica signal frame is corrected using the envelope feature to obtain the high-frequency reconstructed signal frame.
  • the reconstruction unit is specifically used to:
  • High-frequency reconstruction is performed according to the coding parameters and the pitch period to obtain the high-frequency reconstructed signal frame.
  • the high-frequency audio signal decoding device 1400 includes a receiving unit 1401, parsing unit 1402 and decoding unit 1403:
  • the receiving unit 1401 is used to receive the high-frequency code stream sent by the transmitting end.
  • the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream;
  • the analysis unit 1402 is used to analyze and obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier;
  • the decoding unit 1403 is configured to decode the high-frequency code stream according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.
  • An embodiment of the present application also provides a computer device that can execute a coding and decoding method for high-frequency audio signals.
  • the computer device may be, for example, a terminal. Taking the terminal as a smartphone as an example:
  • FIG. 15 shows a block diagram of a partial structure of a smart phone provided by an embodiment of the present application.
  • the smartphone includes: radio frequency (English full name: Radio Frequency, English abbreviation: RF) circuit 1510, memory 1520, input unit 1530, display unit 1540, sensor 1550, audio circuit 1560, wireless fidelity (English abbreviation: WiFi ) module 1570, processor 1580, and power supply 1590 and other components.
  • the input unit 1530 may include a touch panel 1531 and other input devices 1532
  • the display unit 1540 may include a display panel 1541
  • the audio circuit 1560 may include a speaker 1561 and a microphone 1562.
  • the structure of the smart phone shown in Figure 15 does not constitute a limitation to the smart phone, and may include more or less components than shown in the figure, or combine certain components, or arrange different components.
  • the memory 1520 can be used to store software programs and modules.
  • the processor 1580 executes various functional applications and data processing of the smart phone by running the software programs and modules stored in the memory 1520 .
  • the memory 1520 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store data based on Data created by the use of smartphones (such as audio data, phone books, etc.), etc.
  • memory 1520 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
  • the processor 1580 is the control center of the smartphone, using various interfaces and lines to connect various parts of the entire smartphone, by running or executing software programs and/or modules stored in the memory 1520, and calling data stored in the memory 1520 , perform various functions of the smartphone and process data.
  • the processor 1580 may include one or more processing units; preferably, the processor 1580 may integrate an application processor and a modem processor, where the application processor mainly processes operating systems, user interfaces, application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 1580.
  • the processor 1580 in the smartphone can perform the following steps:
  • the coding method whose coding error is within the error preset interval is determined from the multiple coding methods as the target coding method.
  • the coding error of the coding method is the encoding method of the original high-frequency audio signal frame.
  • the high-frequency code stream has an encoding identifier, and the encoding identifier is used to indicate the encoding method used to obtain the high-frequency code stream.
  • processor 1580 may perform the following steps:
  • the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream;
  • the high-frequency code stream is decoded according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.
  • FIG. 16 is a structural diagram of the server 1600 provided by the embodiment of the present application.
  • the server 1600 may vary greatly due to different configurations or performance, and may include a Or more than one central processing unit (Central Processing Units, CPU for short) 1622 (for example, one or more processors) and memory 1632, one or more storage media 1630 (for example, one or more storage media 1630 for storing application programs 1642 or data 1644 mass storage device).
  • the memory 1632 and the storage medium 1630 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server.
  • the central processor 1622 may be configured to communicate with the storage medium 1630 and execute a series of instruction operations in the storage medium 1630 on the server 1600 .
  • Server 1600 may also include one or more power supplies 1626, one or more wired or wireless network interfaces 1650, one or more input and output interfaces 1658, and/or, one or more operating systems 1641, such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM and so on.
  • operating systems 1641 such as Windows Server TM , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM and so on.
  • the steps performed by the central processor 1622 in the server 1600 can be implemented based on the structure shown in FIG. 16 .
  • a computer-readable storage medium is provided.
  • the computer-readable storage medium is used to store a computer program.
  • the computer program is used to perform the encoding of high-frequency audio signals described in the foregoing embodiments. Decoding method.
  • a computer program product or computer program includes computer instructions stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in various optional implementations of the above embodiments.
  • the disclosed systems, devices and methods can be achieved through other means.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc., which can store program code. medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Disclosed are a high-frequency audio signal encoding and decoding method and related apparatuses, which can be applied to various scenarios such as cloud technology, artificial intelligence, intelligent transportation, auxiliary driving, and a vehicle-mounted scenario. The encoding method comprises: acquiring a plurality of encoding modes and acquiring an original high-frequency audio signal frame decomposed from an original audio signal frame, wherein the encoding modes have corresponding priorities, and the numbers of encoding bits of the encoding modes are ascended in the order of descending the priorities of the encoding modes; according to the priorities of the encoding modes, determining, from among the plurality of encoding modes, an encoding mode having an encoding error within a preset error interval as a target encoding mode; and sending to a receiving end a high-frequency code stream obtained by encoding the original high-frequency audio signal frame by using the target encoding mode. According to the present application, an encoding mode having a small number of encoding bits is selected while ensuring encoding quality, thereby achieving satisfactory effects in both the number of encoding bits and the encoding quality, and having a lower number of encoding bits and a high-quality audio.

Description

一种高频音频信号的编解码方法和相关装置A coding and decoding method for high-frequency audio signals and related devices

本申请要求于2022年04月15日提交中国专利局、申请号为202210395889.2、申请名称为“一种高频音频信号的编解码方法和相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on April 15, 2022, with the application number 202210395889.2 and the application title "A high-frequency audio signal encoding and decoding method and related devices", and its entire content is approved by This reference is incorporated into this application.

技术领域Technical field

本申请涉及计算机技术领域,特别是涉及高频音频信号的编解码。The present application relates to the field of computer technology, and in particular to the encoding and decoding of high-frequency audio signals.

背景技术Background technique

音频编解码在现代通讯系统中占有重要的地位,通过对音频信号进行压缩编码处理可以降低音频信号在网络传输中的网络带宽压力,节省音频信号的存储成本和传输成本。Audio coding and decoding occupies an important position in modern communication systems. By compressing and encoding audio signals, the network bandwidth pressure of audio signals in network transmission can be reduced, and the storage and transmission costs of audio signals can be saved.

音频信号的高频成分(即高频音频信号)具有较丰富信息,其对音质影响比较大,高频音频信号损失将导致声音发闷,可懂度下降,保真度降低等问题。而相对于音频信号的低频成分(即低频音频信号),其具有能量占比较低,谐波成分偏低、人耳分辨率较低等特点,因此具有较大的编码压缩空间。The high-frequency component of the audio signal (i.e., the high-frequency audio signal) has rich information and has a greater impact on the sound quality. The loss of the high-frequency audio signal will lead to problems such as muffled sound, reduced intelligibility, and reduced fidelity. Compared with the low-frequency component of the audio signal (i.e., low-frequency audio signal), it has the characteristics of low energy proportion, low harmonic components, and low human ear resolution, so it has a large coding compression space.

目前的高频音频信号编码方式,要么是为了降低编码比特数量而牺牲编码质量,要么是为了提高编码质量而增大编码比特数量,难以在编码比特数量和编码质量上都达到比较满意的效果。The current high-frequency audio signal encoding methods either sacrifice encoding quality in order to reduce the number of encoding bits, or increase the number of encoding bits in order to improve encoding quality. It is difficult to achieve satisfactory results in both the number of encoding bits and encoding quality.

发明内容Contents of the invention

为了解决上述技术问题,本申请提供了一种高频音频信号的编解码方法和相关装置,可以在编码质量允许的情况下,选择使用编码比特数量小的编码方式,实现在编码比特数量和编码质量上都达到比较满意的效果,具有更低的编码比特数量和优质的音频。In order to solve the above technical problems, the present application provides a coding and decoding method for high-frequency audio signals and related devices. When the coding quality allows, a coding method with a small number of coding bits can be selected to achieve a balance between the number of coding bits and the coding quality. The quality is relatively satisfactory, with a lower number of encoding bits and high-quality audio.

本申请实施例公开了如下技术方案:The embodiments of this application disclose the following technical solutions:

一方面,本申请实施例提供一种高频音频信号的编码方法,所述方法包括:On the one hand, embodiments of the present application provide a method for encoding high-frequency audio signals. The method includes:

获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧;Obtain multiple encoding methods, and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame;

获取所述多种编码方式分别对应的优先级,按照所述优先级从高到低的顺序,编码方式的编码比特数量递增;Obtain the priorities corresponding to the multiple encoding methods, and in the order of the priorities from high to low, the number of encoding bits of the encoding method increases;

根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对所述原始高频音频信号帧进行编码产生的;According to the priority of the coding method, the coding method whose coding error is within the error preset interval is determined from the multiple coding methods as the target coding method. The coding error of the coding method is the encoding method of the original high-frequency audio signal frame. Generated by coding;

将利用所述目标编码方式对所述原始高频音频信号帧进行编码得到的高频码流发送至接收端,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式。The high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method is sent to the receiving end. The high-frequency code stream has an encoding identifier, and the encoding identifier is used to indicate that the encoding obtained by The encoding method used for high-frequency code streams.

一方面,本申请实施例提供另一种高频音频信号的解码方法,所述方法包括:On the one hand, embodiments of the present application provide another method for decoding high-frequency audio signals. The method includes:

接收发送端发送的高频码流,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式;Receive the high-frequency code stream sent by the transmitting end, the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream;

解析得到所述高频码流对应的编码标识,并确定所述编码标识所指示的编码方式;Analyze to obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier;

根据所述编码方式对应的解码方式对所述高频码流进行解码,得到高频音频信号帧。The high-frequency code stream is decoded according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.

一方面,本申请实施例提供一种高频音频信号的编码装置,所述装置包括获取单元、 确定单元和发送单元:On the one hand, embodiments of the present application provide a device for encoding high-frequency audio signals. The device includes an acquisition unit, Determine unit and send unit:

所述获取单元,用于获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧;The acquisition unit is used to acquire multiple encoding methods and acquire original high-frequency audio signal frames decomposed from original audio signal frames;

所述获取单元,还用于获取所述多种编码方式分别对应的优先级,按照所述优先级从高到低的顺序,编码方式的编码比特数量递增;The acquisition unit is also used to acquire the priorities corresponding to the multiple encoding methods. According to the order of the priorities from high to low, the number of encoding bits of the encoding methods increases;

所述确定单元,用于根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对所述原始高频音频信号帧进行编码产生的;The determination unit is configured to determine, from the plurality of coding methods, a coding method with a coding error within a preset error interval as a target coding method according to the priority of the coding method. The coding error of the coding method is determined by using the coding method. It is generated by encoding the original high-frequency audio signal frame;

所述发送单元,用于将利用所述目标编码方式对所述原始高频音频信号帧进行编码得到的高频码流发送至接收端,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式。The sending unit is configured to send a high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to the receiving end. The high-frequency code stream has a coding identifier, and the coding identifier Used to indicate the encoding method used to obtain the high-frequency code stream.

一方面,本申请实施例提供另一种高频音频信号的解码装置,所述装置包括接收单元、解析单元和解码单元:On the one hand, embodiments of the present application provide another device for decoding high-frequency audio signals. The device includes a receiving unit, an analysis unit, and a decoding unit:

所述接收单元,用于接收发送端发送的高频码流,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式;The receiving unit is used to receive the high-frequency code stream sent by the transmitting end. The high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream;

所述解析单元,用于解析得到所述高频码流对应的编码标识,并确定所述编码标识所指示的编码方式;The analysis unit is used to analyze and obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier;

所述解码单元,用于根据所述编码方式对应的解码方式对所述高频码流进行解码,得到高频音频信号帧。The decoding unit is configured to decode the high-frequency code stream according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.

一方面,本申请实施例提供一种计算机设备,所述计算机设备包括处理器以及存储器:On the one hand, embodiments of the present application provide a computer device, which includes a processor and a memory:

所述存储器用于存储计算机程序,并将所述计算机程序传输给所述处理器;The memory is used to store a computer program and transmit the computer program to the processor;

所述处理器用于根据所述计算机程序中的指令执行前述任一方面所述的方法。The processor is configured to execute the method described in any of the foregoing aspects according to instructions in the computer program.

一方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行前述任一方面所述的方法。On the one hand, embodiments of the present application provide a computer-readable storage medium used to store a computer program, and the computer program is used to execute the method described in any of the foregoing aspects.

一方面,本申请实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现前述任一方面所述的方法。On the one hand, embodiments of the present application provide a computer program product, including a computer program, which implements the method described in any of the foregoing aspects when executed by a processor.

由上述技术方案可以看出,本申请针对原始音频信号的原始高频音频信号,提出一种基于编码误差判决的多种编码方式混合的高频音频信号编解码方法,具体的,针对原始音频信号中的每个原始音频信号帧,获取多种编码方式以及获取从原始音频信号帧中分解得到的原始高频音频信号帧,编码方式具有对应的优先级,按照优先级从高到低的顺序,编码方式的编码比特数量递增。然后根据编码方式的优先级,从多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对原始高频音频信号帧进行编码产生的,从而可以以编码误差为判别标准,以编码比特数量最优为目标确定出目标编码方式,将利用目标编码方式对原始高频音频信号帧进行编码得到的高频码流发送至接收端,从而在编码质量允许的情况下,选择使用编码比特数量小的编码方式,降低了音频信号传输的带宽。由于高频码流具有编码标识,编码标识用于指示编码得到高频码流所使用的编码方式,以便解码端可以根据编码标识确定使用哪种编码 方式对接收到的高频码流进行解码。可见,本申请可以在编码质量允许的情况下,选择使用编码比特数量小的编码方式,实现在编码比特数量和编码质量上都达到比较满意的效果,具有更低的编码比特数量和优质的音频。It can be seen from the above technical solution that this application proposes a high-frequency audio signal encoding and decoding method based on the mixing of multiple encoding methods based on coding error judgment for the original high-frequency audio signal. Specifically, for the original audio signal For each original audio signal frame in, obtain multiple encoding methods and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame. The encoding methods have corresponding priorities, in order from high to low priority, The encoding method has an increasing number of coded bits. Then according to the priority of the coding method, the coding method with coding error within the error preset interval is determined from multiple coding methods as the target coding method. The coding error of the coding method is generated by encoding the original high-frequency audio signal frame using the coding method. , so that the coding error can be used as the criterion, the target coding method can be determined with the optimal number of coding bits as the goal, and the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target coding method is sent to the receiving end. Therefore, when the encoding quality permits, the encoding method with a small number of encoding bits is selected, which reduces the bandwidth of audio signal transmission. Since the high-frequency code stream has a coding identifier, the coding identifier is used to indicate the encoding method used to obtain the high-frequency code stream, so that the decoding end can determine which encoding to use based on the coding identifier. Method to decode the received high-frequency code stream. It can be seen that this application can choose to use a coding method with a small number of coded bits when the coding quality permits, to achieve relatively satisfactory results in both the number of coded bits and coding quality, with a lower number of coded bits and high-quality audio. .

附图说明Description of the drawings

图1为本申请实施例提供的一种高频音频信号的编解码方法的应用场景架构图;Figure 1 is an application scenario architecture diagram of a high-frequency audio signal encoding and decoding method provided by an embodiment of the present application;

图2为本申请实施例提供的一种高频音频信号的编码方法的流程图;Figure 2 is a flow chart of a high-frequency audio signal encoding method provided by an embodiment of the present application;

图3为本申请实施例提供的一种SBR方式的编码流程图;Figure 3 is a coding flow chart of an SBR method provided by an embodiment of the present application;

图4为本申请实施例提供的一种SBR方式的解码流程图;Figure 4 is a decoding flow chart of an SBR method provided by an embodiment of the present application;

图5为本申请实施例提供的一种低频音频信号复制和对高频复制信号进行校正的示意图;Figure 5 is a schematic diagram of a low-frequency audio signal copy and correction of a high-frequency copy signal provided by an embodiment of the present application;

图6为本申请实施例提供的一种CELP编码方式的编码流程图;Figure 6 is a coding flow chart of a CELP coding method provided by an embodiment of the present application;

图7为本申请实施例提供的一种CELP编码方式的解码流程图;Figure 7 is a decoding flow chart of a CELP encoding method provided by an embodiment of the present application;

图8为本申请实施例提供的一种编码误差的确定方法的流程图;Figure 8 is a flow chart of a coding error determination method provided by an embodiment of the present application;

图9为本申请实施例提供的一种国际声学标准组织测定的声学等响曲线图;Figure 9 is an acoustic equal loudness curve measured by the International Acoustic Standards Organization provided by the embodiment of the present application;

图10为本申请实施例提供的一种计算得到的听觉感知加权系数图;Figure 10 is a calculated auditory perception weighting coefficient diagram provided by an embodiment of the present application;

图11为本申请实施例提供的一种高频音频信号的解码方法的流程图;Figure 11 is a flow chart of a method for decoding high-frequency audio signals provided by an embodiment of the present application;

图12为本申请实施例提供的一种高频音频信号的编解码方法的整体实现架构图;Figure 12 is an overall implementation architecture diagram of a high-frequency audio signal encoding and decoding method provided by an embodiment of the present application;

图13为本申请实施例提供的一种高频音频信号的编码装置的结构图;Figure 13 is a structural diagram of a high-frequency audio signal encoding device provided by an embodiment of the present application;

图14为本申请实施例提供的一种高频音频信号的解码装置的结构图;Figure 14 is a structural diagram of a high-frequency audio signal decoding device provided by an embodiment of the present application;

图15为本申请实施例提供的一种终端的结构图;Figure 15 is a structural diagram of a terminal provided by an embodiment of the present application;

图16为本申请实施例提供的一种服务器的结构图。Figure 16 is a structural diagram of a server provided by an embodiment of the present application.

具体实施方式Detailed ways

下面结合附图,对本申请的实施例进行描述。The embodiments of the present application are described below with reference to the accompanying drawings.

音频编解码在现代通讯系统中占有重要的地位。例如在语音通话应用中,音频信号经由麦克风采集得到,通过模数转换电路将模拟音频信号转换为数字音频信号,数字音频信号经过编码器进行压缩,而后按照通信网络传输格式和协议打包发送到接收端,接收端接收到数据包后解包输出编码码流,通过解码器后重新生成音频数字信号,最后音频数字信号通过扬声器进行播放。音频编解码可以有效地降低音频信号传输的带宽,对于节省音频信号存储传输成本,保障通信网络传输过程中的音频信号完整性方面起了决定性作用。Audio codec plays an important role in modern communication systems. For example, in a voice call application, the audio signal is collected through a microphone, and the analog audio signal is converted into a digital audio signal through an analog-to-digital conversion circuit. The digital audio signal is compressed by an encoder, and then packaged and sent to the receiver according to the communication network transmission format and protocol. After receiving the data packet, the receiving end unpacks and outputs the encoded code stream, and then regenerates the audio digital signal through the decoder. Finally, the audio digital signal is played through the speaker. Audio coding and decoding can effectively reduce the bandwidth of audio signal transmission, play a decisive role in saving audio signal storage and transmission costs, and ensuring the integrity of audio signals during communication network transmission.

音频信号的高频音频信号具有较丰富信息,其对音质影响比较大,并且高频音频信号相对于音频信号的低频音频信号,具有能量占比较低,谐波成分偏低、人耳分辨率较低等特点,因此具有较大的编码压缩空间。The high-frequency audio signal of the audio signal has richer information, which has a greater impact on the sound quality. Compared with the low-frequency audio signal of the audio signal, the high-frequency audio signal has a lower proportion of energy, lower harmonic components, and lower resolution of the human ear. Low-level characteristics, so it has a large coding compression space.

相关技术提供的高频音频信号编码方式,要么是为了降低编码比特数量而牺牲编码质量(例如盲扩方式),要么是为了提高编码质量而增大编码比特数量(例如码激励线性预测(Code-Excited Linear Prediction,CELP)编码方式),难以在编码比特数量和编码质量上都达到比较满意的效果。The high-frequency audio signal coding methods provided by related technologies either sacrifice coding quality in order to reduce the number of coding bits (such as the blind expansion method), or increase the number of coding bits in order to improve the coding quality (such as code-excited linear prediction (Code-Excitation Linear Prediction)). Excited Linear Prediction (CELP) coding method), it is difficult to achieve satisfactory results in terms of the number of coding bits and coding quality.

为了解决上述技术问题,本申请实施例提供一种高频音频信号的编解码方法,该方法 是一种基于编码误差判决的多种编码方式混合的高频音频信号编解码方法,可以在编码质量允许的情况下,选择使用编码比特数量小的编码方式,实现在编码比特数量和编码质量上都达到比较满意的效果,具有更低的编码比特数量和优质的音频。其中,音频信号可以为语音、音乐等等。In order to solve the above technical problems, embodiments of the present application provide a encoding and decoding method for high-frequency audio signals. It is a high-frequency audio signal coding and decoding method based on the mixing of multiple coding methods based on coding error judgment. When the coding quality allows, the coding method with a small number of coding bits can be selected to achieve the best performance in terms of the number of coding bits and coding quality. All achieve relatively satisfactory results, with lower encoding bit count and high-quality audio. Among them, the audio signal can be speech, music, etc.

如图1所示,图1示出了一种高频音频信号的编解码方法的应用场景架构图。在该应用场景中可以包括发送端101和接收端102。其中,发送端101和接收端102可以都为终端,也可以发送端101为终端,接收端102为服务器,等等。其中,终端例如可以是手机、电脑、智能语音交互设备、智能家电、车载终端、飞行器等,但并不局限于此。服务器例如可以是独立的服务器,也可以是集群中的服务器或者云服务器。当发送端101为终端,接收端102为服务器时,终端和服务器可以通过有线或无线方式连接。本申请实施例将以语音通话场景为例进行介绍,此时发送端101和接收端102可以都为终端,且终端是手机。上述终端和服务器均属于计算机设备。As shown in Figure 1, Figure 1 shows an application scenario architecture diagram of a high-frequency audio signal encoding and decoding method. This application scenario may include a sending end 101 and a receiving end 102. Among them, the sending end 101 and the receiving end 102 can both be terminals, or the sending end 101 can be a terminal, the receiving end 102 can be a server, and so on. The terminal may be, for example, a mobile phone, a computer, an intelligent voice interaction device, a smart home appliance, a vehicle-mounted terminal, an aircraft, etc., but is not limited thereto. The server can be, for example, an independent server, a server in a cluster, or a cloud server. When the sending end 101 is a terminal and the receiving end 102 is a server, the terminal and the server can be connected through wired or wireless means. The embodiment of this application will be introduced using a voice call scenario as an example. In this case, both the sending end 101 and the receiving end 102 can be terminals, and the terminal is a mobile phone. The above-mentioned terminals and servers are computer equipment.

在语音通话场景中,发送端101可以通过对应的麦克风收集原始音频信号(此时音频信号可以是发送端101对应用户的语音),发送端101在将原始音频信号发送至接收端102之前,可以对原始音频信号进行编码,本申请实施例主要介绍对原始音频信号中的原始高频音频信号进行编码。In the voice call scenario, the sending end 101 can collect the original audio signal through the corresponding microphone (in this case, the audio signal can be the voice of the corresponding user of the sending end 101). Before sending the original audio signal to the receiving end 102, the sending end 101 can To encode the original audio signal, the embodiment of this application mainly introduces the encoding of the original high-frequency audio signal in the original audio signal.

针对原始音频信号中的每个原始音频信号帧,发送端101可以获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧,编码方式具有对应的优先级,编码方式的优先级用于指示使用该编码方式进行编码的优先顺序,通常情况下,为了尽量降低音频信号传输的带宽,按照优先级从高到低的顺序,编码方式的编码比特数量递增。For each original audio signal frame in the original audio signal, the sending end 101 can obtain multiple encoding methods, and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame. The encoding methods have corresponding priorities. The priority of the method is used to indicate the priority of encoding using this encoding method. Normally, in order to reduce the bandwidth of audio signal transmission as much as possible, the number of encoding bits of the encoding method increases in order from high to low priority.

然后发送端101根据编码方式的优先级,从多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对原始高频音频信号帧进行编码产生的,从而可以以编码误差为判别标准,以编码比特数量最优为目标确定出目标编码方式。发送端101将利用目标编码方式对原始高频音频信号帧进行编码得到的高频码流发送至接收端102,从而在编码质量允许的情况下,选择使用编码比特数量小的编码方式,降低了音频信号传输的带宽。由于高频码流具有编码标识,编码标识用于指示编码得到高频码流所使用的编码方式,以接收端102可以根据编码标识确定使用哪种编码方式对接收到的高频码流进行解码。Then the sending end 101 determines the coding method with coding error within the preset error interval from multiple coding methods as the target coding method according to the priority of the coding method. The coding error of the coding method is calculated by using the coding method to encode the original high-frequency audio signal frame. Generated by encoding, the target encoding method can be determined with the encoding error as the criterion and the optimal number of encoding bits as the goal. The transmitting end 101 sends the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to the receiving end 102, so that when the encoding quality allows, the encoding method with a small number of encoding bits is selected to reduce the cost. The bandwidth of audio signal transmission. Since the high-frequency code stream has an encoding identifier, the encoding identifier is used to indicate the encoding method used to obtain the high-frequency code stream, so that the receiving end 102 can determine which encoding method to use to decode the received high-frequency code stream based on the encoding identifier. .

接收端102根据编码标识确定与其所标识的编码方式对应的解码方式,从而利用对应的解码方式对高频码流进行解码得到高频音频信号帧,并通过对应的扬声器播放。The receiving end 102 determines the decoding method corresponding to the identified encoding method according to the encoding identification, thereby using the corresponding decoding method to decode the high-frequency code stream to obtain the high-frequency audio signal frame, and play it through the corresponding speaker.

需要说明的是,本申请实施例可以应用于各种场景,包括但不限于云技术、人工智能、智慧交通、辅助驾驶、车载场景等,具体应用于这些场景中的语音通话、视频会议、人机交互场景等等。It should be noted that the embodiments of this application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, smart transportation, assisted driving, vehicle-mounted scenarios, etc., and are specifically applied to voice calls, video conferencing, in-person communication in these scenarios. Computer interaction scenes and so on.

接下来,将从发送端的角度,结合附图对本申请实施例提供的高频音频信号的编码方法进行详细介绍。Next, the encoding method of high-frequency audio signals provided by the embodiments of the present application will be introduced in detail from the perspective of the transmitter with reference to the accompanying drawings.

参见图2,图2示出了一种高频音频信号的编码方法的流程图,本实施例可由计算机设备执行,所述方法包括: Referring to Figure 2, Figure 2 shows a flow chart of a method for encoding high-frequency audio signals. This embodiment can be executed by a computer device. The method includes:

S201、获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧。S201. Obtain multiple encoding methods, and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame.

当发送端获取到原始音频信号时,有效地降低音频信号传输的带宽,可以先对原始音频信号进行编码,从而将编码码流传输至接收端。而原始音频信号中可以包括高频音频信号,基于高频音频信号的特点,高频音频信号具有较大的编码空间。而针对高频音频信号进行编码时,本申请实施例提供一种基于编码误差判决的多种编码方式混合的高频音频信号编码方法。具体的,在针对原始音频信号中的每个原始音频信号帧,发送端可以获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧。When the sending end obtains the original audio signal, the bandwidth of the audio signal transmission is effectively reduced, and the original audio signal can be encoded first, thereby transmitting the encoded code stream to the receiving end. The original audio signal may include high-frequency audio signals. Based on the characteristics of high-frequency audio signals, high-frequency audio signals have a large coding space. When encoding high-frequency audio signals, embodiments of the present application provide a high-frequency audio signal encoding method that mixes multiple encoding methods based on coding error determination. Specifically, for each original audio signal frame in the original audio signal, the transmitting end can obtain multiple encoding methods, and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame.

在本申请实施例中,可供选择的多种编码方式可以是已有编码方式中任意多个编码方式,例如可以包括音频超分辨率(Speech Super Resolution,SSR)方式、频带复制(Spectral Band Replication,SBR)方式、CELP编码方式等方式中两种以上组合,其中,SSR方式是一种盲扩方式,SBR方式和CELP编码方式是一种非盲扩方式,当然多种编码方式还可以包括其他编码方式,本申请实施例对此不做限定。本申请实施例主要以多种编码方式包括SSR方式、SBR方式和CELP编码方式为例进行介绍。In the embodiment of the present application, the multiple encoding methods available for selection can be any number of existing encoding methods, such as audio super resolution (Speech Super Resolution, SSR) method, frequency band replication (Spectral Band Replication) , SBR) method, CELP coding method and other methods in combination. Among them, the SSR method is a blind expansion method, and the SBR method and CELP coding method are a non-blind expansion method. Of course, various coding methods can also include other The encoding method is not limited in the embodiments of this application. The embodiments of this application mainly introduce multiple coding methods including SSR method, SBR method and CELP coding method as examples.

下面依次对每种编码方式的编解码原理进行介绍:The encoding and decoding principles of each encoding method are introduced in turn below:

SSR方式是一种盲式频带扩展方式,也可以称为盲扩方式,SSR方式在进行编码时,不会向接收端发送编码参数,因此该方式不占用编码比特数量。在接收端进行解码时,基于低频音频信号和高频音频信号是具有一定的相关性,通过一些预测的方法,例如深度学习中的神经网络模型,通过低频音频信号映射出高频音频信号。该方式旨在重建一个以较低分辨率的音频信号作为输入的高分辨率音频信号。随着深度神经网络的快速发展,SSR方式可以基于神经网络模型通过输入的低频音频信号的特征信息预测高频音频信号的特征信息,从而虚构出高频音频信号。The SSR method is a blind frequency band expansion method, which can also be called a blind expansion method. The SSR method does not send coding parameters to the receiving end when encoding, so this method does not occupy the number of coding bits. When decoding at the receiving end, there is a certain correlation between the low-frequency audio signal and the high-frequency audio signal. Through some prediction methods, such as the neural network model in deep learning, the high-frequency audio signal is mapped from the low-frequency audio signal. This method aims to reconstruct a high-resolution audio signal with a lower-resolution audio signal as input. With the rapid development of deep neural networks, the SSR method can predict the characteristic information of high-frequency audio signals based on the characteristic information of the input low-frequency audio signals based on the neural network model, thereby fictionalizing the high-frequency audio signals.

SBR方式是一种非盲扩方式,需要根据发送端传送的少量编码参数来重建高频音频信号。由于高频重建有辅助信息支持,因此该方式具有较佳的重建品质。如下图3和图4所示,其中图3为SBR方式的编码流程图,图4为SBR方式的解码流程图。图3中示意的是高级音频编码(Advanced Audio Coding,AAC)+SBR的编码方式,图4中示意的是对应的AAC+SBR的解码方式。在图3中,先将原始音频信号分解为高频音频信号和低频音频信号,例如通过正交镜像(Quadrature Mirror Filter,QMF)滤波器组得到高频音频信号,通过2:1下采样器得到低频音频信号,低频音频信号采用AAC编码器生成低频音频信号的编码参数,而高频音频信号基于SBR编码器进行编码,通过低频音频信号复制到高频频段得到高频复制信号,然后根据包络提取得到包络特征,利用包络特征对高频复制信号进行校正(该过程如图5所示),提取编码参数以发送至接收端。从图5中可以看出,图5中(a)图为直接将低频音频信号复制到高频频段得到高频复制信号501后的高频能量曲线示意图,而该高频能量与实际高频能量略有差别,包络特征更能准确的反映高频能量,故基于包络提取得到的包络特征对高频复制信号进行校正,得到高频重建信号502,此时得到的高频能量曲线可以参见图5中(b)所示。上述过程得到的编码后的高频音频信号和低频音频信号可以通过比特流复用器的组合得到对应的编码码流。 The SBR method is a non-blind expansion method that requires reconstructing high-frequency audio signals based on a small number of coding parameters transmitted by the transmitter. Since high-frequency reconstruction is supported by auxiliary information, this method has better reconstruction quality. As shown in Figure 3 and Figure 4 below, Figure 3 is the encoding flow chart of the SBR mode, and Figure 4 is the decoding flow chart of the SBR mode. Figure 3 illustrates the encoding method of Advanced Audio Coding (AAC)+SBR, and Figure 4 illustrates the corresponding AAC+SBR decoding method. In Figure 3, the original audio signal is first decomposed into a high-frequency audio signal and a low-frequency audio signal. For example, the high-frequency audio signal is obtained through a Quadrature Mirror Filter (QMF) filter bank, and the high-frequency audio signal is obtained through a 2:1 downsampler. The low-frequency audio signal uses an AAC encoder to generate the encoding parameters of the low-frequency audio signal, while the high-frequency audio signal is encoded based on the SBR encoder. The low-frequency audio signal is copied to the high-frequency band to obtain the high-frequency copy signal, and then the high-frequency replica signal is obtained according to the envelope. The envelope features are extracted, used to correct the high-frequency replica signal (the process is shown in Figure 5), and the encoding parameters are extracted and sent to the receiving end. As can be seen from Figure 5, (a) in Figure 5 is a schematic diagram of the high-frequency energy curve after directly copying the low-frequency audio signal to the high-frequency band to obtain the high-frequency copy signal 501, and the high-frequency energy is different from the actual high-frequency energy. There is a slight difference. The envelope feature can more accurately reflect the high-frequency energy. Therefore, the high-frequency copy signal is corrected based on the envelope feature obtained by envelope extraction to obtain the high-frequency reconstructed signal 502. The high-frequency energy curve obtained at this time can be See (b) in Figure 5. The encoded high-frequency audio signal and low-frequency audio signal obtained through the above process can be combined by a bit stream multiplexer to obtain the corresponding encoded code stream.

通过对上述SBR方式的编码过程的介绍,SBR方式只需要传输有限的参数到接收端。Through the introduction of the encoding process of the SBR method above, the SBR method only needs to transmit limited parameters to the receiving end.

而图4示出的接收端的解码过程,通过码流分解器将编码码流分解成编码后的低频音频信号和编码后的高频音频信号。先是解码编码后的低频音频信号,编码后的低频音频信号通过AAC解码器生成低频音频信号,将其通过QMF分析滤波器后参与高频重建。高频重建过程为,通过SBR解码器解码得到所需的编码参数,将低频音频信号复制到高频频段,得到高频复制信号。包络提取得到的包络特征对高频复制信号进行校正生成高频重建信号,最终经过一定延时使高低频信号对齐并通过综合滤波器进行合并成全带的音频信号。The decoding process at the receiving end shown in Figure 4 uses a code stream decomposer to decompose the coded code stream into coded low-frequency audio signals and coded high-frequency audio signals. First, the encoded low-frequency audio signal is decoded. The encoded low-frequency audio signal is generated by the AAC decoder, which is passed through the QMF analysis filter and then participates in high-frequency reconstruction. The high-frequency reconstruction process is to decode the SBR decoder to obtain the required encoding parameters, copy the low-frequency audio signal to the high-frequency band, and obtain the high-frequency replica signal. The envelope features obtained by envelope extraction correct the high-frequency copy signal to generate a high-frequency reconstructed signal. Finally, after a certain delay, the high- and low-frequency signals are aligned and combined into a full-band audio signal through a comprehensive filter.

CELP编码方式是一种有效的中低编码比特数量的语音压缩编码方式,是以码本为激励源,具有码率低、合成语音质量高、抗噪能力强等优点,在4.8~16kbps码率上得到广泛应用,目前采用CELP编码方式的编码器有多种型号等等。图6和图7分别为CELP编码方式的编码流程图和CELP编码方式的解码流程图。在图6中,原始音频信号经过预处理例如高通滤波后,通过线性预测编码(Linear Predictive Coding,LPC)得到一组线性预测滤波系数,并将LPC参数(例如线性预测滤波系数)转换为LSP参数并量化,从而便于向接收端传输。预处理后的原始音频信号s(n)与LPC预测滤波结果的差为残差信号,残差信号经过感觉加权滤波器后,得到滤波后的残差信号,基于滤波后的残差信号e(n),并以最小感知加权误差为原则,搜索最佳的固定码本、自适应码本,以计算固定码本增益(Gc)和自适应码本增益(Ga)。这些编码过程中得到的编码参数经过封装打包并通过传输到接收端。The CELP coding method is an effective speech compression coding method with a medium and low number of coding bits. It uses the codebook as the excitation source. It has the advantages of low code rate, high synthetic speech quality, and strong anti-noise ability. It operates at a code rate of 4.8 to 16kbps. It has been widely used. Currently, there are many types of encoders using CELP encoding. Figures 6 and 7 are respectively the encoding flow chart of the CELP encoding method and the decoding flow chart of the CELP encoding method. In Figure 6, after the original audio signal is preprocessed such as high-pass filtering, a set of linear prediction filter coefficients is obtained through linear predictive coding (LPC), and the LPC parameters (such as linear prediction filter coefficients) are converted into LSP parameters And quantized to facilitate transmission to the receiving end. Preprocessed original audio signal s(n) and LPC prediction filtering results The difference is the residual signal. After the residual signal passes through the perceptual weighting filter, the filtered residual signal is obtained. Based on the filtered residual signal e(n), and based on the principle of minimum perceptual weighted error, search for the best Fixed codebook and adaptive codebook to calculate fixed codebook gain (Gc) and adaptive codebook gain (Ga). The coding parameters obtained during these coding processes are encapsulated and packaged and transmitted to the receiving end.

在解码过程中,参见图7所示,接收端通过解码器从接收的数据包中解析出所有编码参数,同时基于固定码本和固定码本增益生成固定码本激励信号,而基于自适应码本和自适应码本增益生成自适应码本激励信号,两种激励之和经过合成滤波器进行滤波和后处理后得到最终的音频信号。其中,合成滤波器的滤波器系数是对LSP参数内插得到的。During the decoding process, as shown in Figure 7, the receiving end parses all coding parameters from the received data packet through the decoder, and generates a fixed codebook excitation signal based on the fixed codebook and fixed codebook gain, and based on the adaptive codebook The adaptive codebook gain and the adaptive codebook gain generate an adaptive codebook excitation signal. The sum of the two excitations is filtered and post-processed by a synthesis filter to obtain the final audio signal. Among them, the filter coefficients of the synthesis filter are obtained by interpolating the LSP parameters.

S202、获取所述多种编码方式分别对应的优先级,按照所述优先级从高到低的顺序,编码方式的编码比特数量递增。S202: Obtain the priorities corresponding to the multiple encoding methods, and in order of the priorities from high to low, the number of encoding bits of the encoding methods increases.

在本申请实施例中,编码方式具有对应的优先级,编码方式的优先级用于指示使用该编码方式进行编码的优先顺序,通常情况下,为了尽量降低音频信号传输的带宽,按照优先级从高到低的顺序,编码方式的编码比特数量递增,即传输的带宽占用或者压缩存储空间递增。In the embodiment of the present application, the coding method has a corresponding priority. The priority of the coding method is used to indicate the priority of using this coding method for encoding. Normally, in order to reduce the bandwidth of audio signal transmission as much as possible, according to the priority, from In order from high to low, the number of encoding bits of the encoding method increases, that is, the bandwidth occupied by the transmission or the compressed storage space increases.

当多种编码方式包括SSR方式、SBR方式和CELP编码方式时,由于编码比特数量从小到大依次是SSR方式SBR方式和CELP编码方式,故优先级从高到低依次是SSR方式SBR方式和CELP编码方式。When multiple coding methods include SSR, SBR and CELP coding, since the number of coded bits is sequentially SSR, SBR and CELP from small to large, the priority from high to low is SSR, SBR and CELP. Encoding.

S203、根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对所述原始高频音频信号帧进行编码产生的。S203. According to the priority of the encoding method, determine the encoding method whose encoding error is within the preset error interval from the multiple encoding methods as the target encoding method. The encoding error of the encoding method is the encoding method of the original high-frequency audio. Signal frames are encoded.

相关技术中,在使用盲扩方式时,具有不占用编码比特数量的优势进行高频重建,其高频音频信号帧完全基于低频音频信号帧的特征信息进行预测,虽然高低频音频信号帧具有一定的相关性,但没有绝对的对应关系,所以高频重建得到的高频音频信号帧与原始高频音频信号帧存在较大误差;SBR方式仅能保证包络匹配,无法进一步降低误差,且占用 较少编码比特数量;基于CELP编码方式的高频重建通过LSP参数确保重建得到的高频音频信号帧与原始高频音频信号帧具有一致的包络,同时通过码本激励进一步降低重建得到的高频音频信号帧与原始高频音频信号帧的误差,但是占用较多编码比特数量。基于上述分析,本申请实施例的目标是在编码比特数量和编码质量(即误差较小)上都达到比较满意的效果,为此本申请实施例基于编码误差判决和编码方式的优先级(优先级体现了编码比特数量),选择合适的编码方式对当前的原始高频音频信号帧进行编码。具体的,发送端根据编码方式的优先级,从多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对原始高频音频信号帧进行编码产生的。In related technologies, when blind expansion is used, it has the advantage of not occupying the number of coding bits for high-frequency reconstruction. The high-frequency audio signal frame is completely predicted based on the characteristic information of the low-frequency audio signal frame. Although the high-frequency and low-frequency audio signal frames have certain correlation, but there is no absolute correspondence, so there is a large error between the high-frequency audio signal frame obtained by high-frequency reconstruction and the original high-frequency audio signal frame; the SBR method can only ensure envelope matching, cannot further reduce the error, and takes up Less number of coding bits; high-frequency reconstruction based on CELP coding ensures that the reconstructed high-frequency audio signal frame has a consistent envelope with the original high-frequency audio signal frame through LSP parameters, and at the same time, codebook excitation is used to further reduce the reconstructed high-frequency audio signal frame. The error between the high-frequency audio signal frame and the original high-frequency audio signal frame, but takes up more coding bits. Based on the above analysis, the goal of the embodiments of the present application is to achieve relatively satisfactory results in both the number of coding bits and the coding quality (that is, the error is small). To this end, the embodiments of the present application are based on the coding error judgment and the priority of the coding method (priority level reflects the number of encoding bits), select an appropriate encoding method to encode the current original high-frequency audio signal frame. Specifically, based on the priority of the encoding method, the sending end determines the encoding method with the encoding error within the error preset interval from multiple encoding methods as the target encoding method. The encoding error of the encoding method is the encoding method of the original high-frequency audio signal. Frames are encoded.

其中,编码误差是利用编码方式对所述原始高频音频信号帧进行编码产生的,可以是重建得到的高频音频信号帧(即高频重建信号帧)与原始高频音频信号帧之间的误差,可以体现编码质量,编码误差越小,编码质量越高。综合考虑优先级和编码质量,可以在编码质量允许的情况下,选择使用编码比特数量小的编码方式,实现在编码比特数量和编码质量上都达到比较满意的效果,具有更低的编码比特数量和优质的音频。误差预设区间可以为大于误差阈值(Thrd),此时编码误差小于或等于Thrd,可以认为编码误差在误差预设区间内,否则,则编码误差超出误差预设区间。The coding error is generated by encoding the original high-frequency audio signal frame using a coding method, and can be the difference between the reconstructed high-frequency audio signal frame (ie, the high-frequency reconstructed signal frame) and the original high-frequency audio signal frame. Error can reflect the encoding quality. The smaller the encoding error, the higher the encoding quality. Considering the priority and coding quality comprehensively, if the coding quality allows, you can choose to use a coding method with a small number of coded bits to achieve a satisfactory effect in terms of the number of coded bits and coding quality, with a lower number of coded bits. and quality audio. The error preset interval may be greater than the error threshold (Thrd). In this case, the encoding error is less than or equal to Thrd. It can be considered that the encoding error is within the error preset interval. Otherwise, the encoding error exceeds the error preset interval.

需要说明的是,本申请实施例提供了多种实现S203的方式。在一种可能的实现方式中,可以分别确定多种编码方式中每种编码方式的编码误差,若存在编码误差在误差预设区间内的编码方式,从编码误差在误差预设区间内的编码方式中,确定优先级最高的编码方式作为目标编码方式。以多种编码方式包括SSR方式、SBR方式和CELP编码方式为例,按照优先级从高到低依次是SSR方式、SBR方式和CELP编码方式,分别确定SSR方式、SBR方式和CELP编码方式的编码误差,若SSR方式和SBR方式的编码误差在误差预设区间内的编码方式中,由于SSR方式的优先级高于SBR方式的优先级,则将SSR方式作为目标编码方式。当然若仅存在SSR方式的编码误差在误差预设区间内的编码方式中,则直接将SSR方式作为目标编码方式,或者若仅存在SBR方式的编码误差在误差预设区间内的编码方式中,则直接将SBR方式作为目标编码方式。It should be noted that the embodiments of this application provide multiple ways to implement S203. In a possible implementation, the coding error of each coding method in multiple coding methods can be determined separately. If there is a coding method in which the coding error is within the error preset interval, the coding error in the coding method is within the error preset interval. Among the methods, the encoding method with the highest priority is determined as the target encoding method. Taking multiple coding methods including SSR method, SBR method and CELP coding method as an example, the order of priority from high to low is SSR method, SBR method and CELP coding method. Determine the encoding of SSR method, SBR method and CELP coding method respectively. Error, if the coding error between the SSR mode and the SBR mode is within the error preset interval among the coding modes, since the priority of the SSR mode is higher than the priority of the SBR mode, the SSR mode will be used as the target encoding mode. Of course, if there are only coding errors in the SSR mode that are within the error preset interval, then the SSR mode is directly used as the target coding mode, or if there are only coding errors in the SBR mode that are within the error preset interval, Then directly use the SBR method as the target encoding method.

若不存在编码误差在误差预设区间内的编码方式,则可以根据实际情况选择编码方式。例如对于质量要求较高的场景,则将编码误差最小的编码方式作为目标编码方式,从而保证编码质量。又如对于带宽要求较高的场景,则将优先级最高的编码方式作为目标编码方式。继续以多种编码方式包括SSR方式、SBR方式和CELP编码方式为例,按照优先级从高到低依次是SSR方式、SBR方式和CELP编码方式,分别确定SSR方式、SBR方式和CELP编码方式的编码误差,任一编码方式的编码误差都超出误差预设区间,对于质量要求较高的场景,由于CELP编码方式的编码误差最小,其编码质量最高,则将CELP编码方式作为目标编码方式;对于带宽要求较高的场景,由于SSR方式的优先级最高,则将SSR方式作为目标编码方式。If there is no coding method with a coding error within the preset error interval, the coding method can be selected according to the actual situation. For example, for scenes with higher quality requirements, the encoding method with the smallest encoding error is used as the target encoding method to ensure encoding quality. For another example, for scenarios with higher bandwidth requirements, the encoding method with the highest priority is used as the target encoding method. Continuing to take multiple coding methods including SSR method, SBR method and CELP coding method as an example, the order of priority from high to low is SSR method, SBR method and CELP coding method. Determine the SSR method, SBR method and CELP coding method respectively. Coding error. The coding error of any coding method exceeds the preset error interval. For scenes with higher quality requirements, since the coding error of the CELP coding method is the smallest and its coding quality is the highest, the CELP coding method is used as the target coding method; for In scenarios with high bandwidth requirements, since the SSR method has the highest priority, the SSR method is used as the target encoding method.

在另一种可能的实现方式中,可以按照优先级从高到低进行阶梯尝试,依次判断编码方式的编码误差是否在误差预设区间内,当前选择的编码方式的编码误差在误差预设区间 内,则停止尝试并选用当前选择的编码方式作为目标编码方式进行编码。具体的,发送端按照优先级从高到低的顺序,依次从多种编码方式中选择待定编码方式,确定待定编码方式的编码误差,若待定编码方式的编码误差在误差预设区间内,则将待定编码方式确定为目标编码方式,并停止继续选择待定编码方式。若待定编码方式为多种编码方式中的最后一种编码方式(即优先级最低的编码方式),则说明之前尝试的编码方式的编码误差都超出误差预设区间,则对于最后一种编码方式可以无需执行确定编码方式的编码误差的步骤,直接将最后一种编码方式作为目标编码方式。In another possible implementation, you can perform ladder attempts from high to low according to priority, and sequentially determine whether the encoding error of the encoding method is within the error preset interval. The encoding error of the currently selected encoding method is within the error preset interval. within, stop trying and select the currently selected encoding method as the target encoding method for encoding. Specifically, the sending end selects the undetermined encoding method from multiple encoding methods in order from high to low priority, and determines the encoding error of the undetermined encoding method. If the encoding error of the undetermined encoding method is within the error preset interval, then Determine the pending encoding method as the target encoding method and stop selecting the pending encoding method. If the pending encoding method is the last encoding method among multiple encoding methods (i.e., the encoding method with the lowest priority), it means that the encoding errors of the previously tried encoding methods are beyond the error preset interval, then for the last encoding method It is possible to directly use the last encoding method as the target encoding method without performing the step of determining the encoding error of the encoding method.

继续以多种编码方式包括SSR方式、SBR方式和CELP编码方式为例,按照优先级从高到低依次是SSR方式、SBR方式和CELP编码方式,则首先选择SSR方式作为待定编码方式,确定SSR方式的编码误差,若SSR方式的编码误差在误差预设区间,则将SSR方式确定为目标编码方式;若SSR方式的编码误差超出误差预设区间,则继续选择SBR方式作为待定编码方式,确定SBR方式的编码误差,若SBR方式的编码误差在误差预设区间,则将SBR方式确定为目标编码方式;若SBR方式的编码误差超出误差预设区间,则将CELP编码方式直接作为目标编码方式。Continuing to take multiple coding methods including SSR method, SBR method and CELP coding method as an example, in order of priority from high to low are SSR method, SBR method and CELP coding method, then the SSR method is first selected as the pending coding method, and the SSR The coding error of the method. If the coding error of the SSR method is within the error preset interval, the SSR method is determined as the target encoding method; if the coding error of the SSR method exceeds the error preset interval, the SBR method continues to be selected as the pending encoding method and determined Coding error of the SBR method. If the coding error of the SBR method is within the error preset interval, the SBR method is determined as the target encoding method; if the coding error of the SBR method exceeds the error preset interval, the CELP encoding method is directly used as the target encoding method. .

S204、将利用所述目标编码方式对所述原始高频音频信号帧进行编码得到的高频码流发送至接收端,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式。S204. Send the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to the receiving end. The high-frequency code stream has an encoding identifier, and the encoding identifier is used to indicate that the encoding has been obtained. The encoding method used by the high-frequency code stream.

当确定出目标编码方式后,可以将利用目标编码方式对原始高频音频信号帧进行编码得到的高频码流发送至接收端,高频码流具有编码标识,编码标识用于指示编码得到高频码流所使用的编码方式,以便接收端可以知晓通过何种编码方式对应的解码方式进行解码。其中,编码标识可以是用于唯一标识编码方式,编码标识可以是各种可能的形式,例如数字、符号、字母等。After the target encoding method is determined, the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method can be sent to the receiving end. The high-frequency code stream has a coding identifier, and the coding identifier is used to indicate that the high-frequency code stream obtained by encoding is The encoding method used for the video code stream, so that the receiving end can know which encoding method corresponds to the decoding method for decoding. Among them, the encoding identifier can be used to uniquely identify the encoding method, and the encoding identifier can be in various possible forms, such as numbers, symbols, letters, etc.

可以理解的是,由于针对每个原始高频音频信号帧进行编码时,都需要依据编码方式的优先级和编码误差选择合适的编码方式进行编码,因此,不同原始高频音频信号帧之间的编码方式可能不同,从而实现多种编码方式混合编码。It can be understood that since when encoding each original high-frequency audio signal frame, it is necessary to select an appropriate encoding method for encoding based on the priority of the encoding method and the encoding error. Therefore, the difference between different original high-frequency audio signal frames The encoding methods may be different, thus achieving mixed encoding of multiple encoding methods.

针对原始高频音频信号选择编码方式时,确定编码误差是比较关键的一步。接下来,将任一编码方式的编码误差的确定方法进行介绍。参见图8所示,所述方法包括:When selecting a coding method for the original high-frequency audio signal, determining the coding error is a critical step. Next, the method of determining the coding error of any coding method is introduced. As shown in Figure 8, the method includes:

S801、获取从所述原始音频信号帧中分解得到的原始低频音频信号帧。S801. Obtain the original low-frequency audio signal frame decomposed from the original audio signal frame.

S802、根据所述原始低频音频信号帧,利用所述任一编码方式进行高频重建,得到高频重建信号帧。S802. According to the original low-frequency audio signal frame, use any encoding method to perform high-frequency reconstruction to obtain a high-frequency reconstructed signal frame.

需要说明的是,根据前述介绍的不同编码方式的编解码原理,针对不同的编码方式,高频重建的方法可能有所不同。若任一编码方式为音频超分辨率方式,S802的实现方式可以是获取音频超分辨率方式对应的神经网络模型,根据原始低频音频信号帧,通过神经网络模型进行预测得到高频重建信号帧。其中,神经网络模型是通过对训练样本进行训练得到的,训练样本为带有标签的样本低频音频信号帧,在训练阶段可以将样本低频音频信号帧对应的低频特征作为神经网络模型的输入,将高频重建信号帧作为神经网络模型的输出,经过大规模训练样本的训练得到可以根据样本低频音频信号帧预测高频重建信号帧的神经 网络模型。在使用时,将原始低频音频信号帧作为神经网络模型的输入,通过神经网络模型提取低频特征,进而根据低频特征预测高频重建信号帧。在一些情况下,也可以将通过其他方式提取原始低频音频信号帧的低频特征,将低频特征作为神经网络模型的输入,从而输出高频重建信号帧。神经网络模型可以是卷积神经网络(Convolutional Neural Networks,CNN)、长短期记忆网络(Long Short-Term Memory,LSTM)等等,本申请实施例对此不做限定。It should be noted that, according to the coding and decoding principles of different coding methods introduced above, the high-frequency reconstruction methods may be different for different coding methods. If any encoding method is the audio super-resolution method, the implementation method of S802 can be to obtain the neural network model corresponding to the audio super-resolution method, and predict the high-frequency reconstructed signal frame through the neural network model based on the original low-frequency audio signal frame. Among them, the neural network model is obtained by training training samples. The training samples are sample low-frequency audio signal frames with labels. In the training phase, the low-frequency features corresponding to the sample low-frequency audio signal frames can be used as the input of the neural network model. The high-frequency reconstructed signal frame is the output of the neural network model. After training with large-scale training samples, a neural network that can predict the high-frequency reconstructed signal frame based on the sample low-frequency audio signal frame is obtained. network model. When used, the original low-frequency audio signal frame is used as the input of the neural network model, the low-frequency features are extracted through the neural network model, and then the high-frequency reconstructed signal frame is predicted based on the low-frequency features. In some cases, the low-frequency features of the original low-frequency audio signal frame can also be extracted through other methods, and the low-frequency features can be used as the input of the neural network model to output a high-frequency reconstructed signal frame. The neural network model may be a convolutional neural network (Convolutional Neural Networks, CNN), a long short-term memory network (Long Short-Term Memory, LSTM), etc., which are not limited in the embodiments of the present application.

若任一编码方式为频带复制方式,S802的实现方式可以是将原始低频音频信号帧复制到高频频段,得到高频复制信号帧。直接将低频音频信号帧复制到高频频段,得到的高频复制信号帧的高频能量与实际高频能量略有差别,而包络特征更能准确的反映高频能量,故可以提取原始高频音频信号帧的包络特征,进而利用包络特征对高频复制信号帧进行校正,得到高频重建信号帧。If any encoding method is a frequency band copy method, S802 can be implemented by copying the original low-frequency audio signal frame to the high-frequency band to obtain a high-frequency copy signal frame. Directly copy the low-frequency audio signal frame to the high-frequency band. The high-frequency energy of the high-frequency copied signal frame is slightly different from the actual high-frequency energy. The envelope feature can more accurately reflect the high-frequency energy, so the original high-frequency energy can be extracted. The envelope characteristics of the high-frequency audio signal frame are then used to correct the high-frequency copied signal frame to obtain the high-frequency reconstructed signal frame.

若任一编码方式为码激励线性预测方式,S802的实现方式可以是从高频码流中获取编码参数,以及获取原始低频音频信号帧的基音周期(pitch),进而根据编码参数和基音周期进行高频重建,得到高频重建信号帧。其中,编码参数可以包括LSP参数、码本数据(例如固定码本和自适应码本)、增益数据(例如固定码本增益和自适应码本增益)。If any encoding method is a code-excited linear prediction method, the implementation of S802 can be to obtain the encoding parameters from the high-frequency code stream and obtain the pitch period (pitch) of the original low-frequency audio signal frame, and then proceed based on the encoding parameters and pitch period. High-frequency reconstruction is performed to obtain a high-frequency reconstructed signal frame. The coding parameters may include LSP parameters, codebook data (such as fixed codebook and adaptive codebook), and gain data (such as fixed codebook gain and adaptive codebook gain).

S803、基于所述高频重建信号帧和所述原始高频音频信号帧进行误差分析,得到对应的编码误差。S803. Perform error analysis based on the high-frequency reconstructed signal frame and the original high-frequency audio signal frame to obtain the corresponding coding error.

在得到高频重建信号帧之后,可以基于高频重建信号帧和原始高频音频信号帧进行误差分析,得到对应的编码误差,编码误差可以体现高频重建信号帧和原始高频音频信号帧之间的误差,从而通过该编码误差衡量该编码方式的编码质量。After obtaining the high-frequency reconstructed signal frame, error analysis can be performed based on the high-frequency reconstructed signal frame and the original high-frequency audio signal frame to obtain the corresponding coding error. The coding error can reflect the difference between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame. The coding error is used to measure the coding quality of the coding method.

基于编码误差的作用,可以理解,在一种可能的实现方式中,S803的实现方式可以是计算高频重建信号帧和原始高频音频信号帧之间的差值信号,进而利用差值信号确定编码误差。若高频重建信号帧表示为S’,原始高频音频信号帧表示为S,则S’与S相减得到差值信号,差值信号可以表示为Err。Based on the role of coding error, it can be understood that in one possible implementation, S803 can be implemented by calculating the difference signal between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame, and then using the difference signal to determine Coding errors. If the high-frequency reconstructed signal frame is represented by S’ and the original high-frequency audio signal frame is represented by S, then S’ and S are subtracted to obtain a difference signal, and the difference signal can be represented as Err.

由于差值信号已经可以体现出高频重建信号帧和原始高频音频信号帧之间的误差,故在一种可能的实现方式中,可以将差值信号作为编码误差,从而较为准确的体现出编码方式的编码误差。Since the difference signal can already reflect the error between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame, in a possible implementation, the difference signal can be used as the coding error to more accurately reflect the error. Encoding error in encoding method.

在一些情况下,差值信号所体现的误差为信号本身的误差,而信号通常需要播放给用户,而用户的听觉感知层面的误差可能与信号本身的误差有所不同,因此,在另一种可能的实现方式中,可以对误差信号采用心理声学感知分析的方法,通过心理声学感知来量化听觉感知层面的误差大小。基于此,在计算编码误差时,可以对差值信号进行听觉感知加权能量计算得到差值能量,以及对原始高频音频信号帧进行听觉感知加权能量计算得到原始能量,将差值能量和原始能量的比值作为编码误差。其中,差值能量和原始能量为听觉感知加权能量。若差值能量表示为EP_err(i),原始能量表示为EP_s(i),则编码误差的计算公式可以为:
In some cases, the error reflected in the difference signal is the error of the signal itself, and the signal usually needs to be played to the user, and the error at the user's auditory perception level may be different from the error of the signal itself. Therefore, in another Among possible implementation methods, psychoacoustic perception analysis method can be used for the error signal, and the error size at the auditory perception level can be quantified through psychoacoustic perception. Based on this, when calculating the coding error, the difference energy can be obtained by calculating the auditory perception weighted energy of the difference signal, and the auditory perception weighted energy of the original high-frequency audio signal frame can be calculated to obtain the original energy. The difference energy and the original energy can be calculated The ratio of is taken as coding error. Among them, the difference energy and the original energy are the auditory perception weighted energy. If the difference energy is expressed as EP_err(i) and the original energy is expressed as EP_s(i), then the coding error calculation formula can be:

其中,w(i)为编码误差,EP_err(i)为差值能量,EP_s(i)为原始能量。将w(i)和误差预设区间Thrd进行对比,当w(i)>Thrd,则说明编码误差超出误差预设区间,相反则在误差预设区间内。Among them, w(i) is the coding error, EP_err(i) is the difference energy, and EP_s(i) is the original energy. Compare w(i) with the error preset interval Thrd. When w(i)>Thrd, it means that the encoding error exceeds the error preset interval. On the contrary, it is within the error preset interval.

通过这种方式,可以从听觉感知方面衡量编码误差,从而实现在听觉感知层面保证编码质量。In this way, the coding error can be measured from the aspect of auditory perception, thereby ensuring the coding quality at the auditory perception level.

听觉感知主要的依据是“响度”,“响度”是随音频信号的强度而变化,但也受频率的影响,即相同强度、不同频率的音频信号对于人耳有着不一样的听觉感知。图9为本申请实施例提供的一种国际声学标准组织测定的声学等响曲线图,声学等响曲线是描述等响条件下声压响度与频率的关系曲线,是重要的听觉特征之一。即在不同频率下的音频信号需要达到何种声压级强度,才能获得对用户来说一致的听觉响度。为了说明该曲线的含义,接下来举例说明,如图9上的任一条等响曲线,可以看到对于中低频(1kHz以下)来说,频率越低,等响需要的声压强度(即能量)越大,简单而言即需要更大的能量才能让用户有相同听觉感受。而对于中高频(1kHz以上)来说,不同频段的音频有着不同的声学听觉感知特征。在这种情况下,听觉感知加权能量的计算过程可以为:The main basis of auditory perception is "loudness". "Loudness" changes with the intensity of the audio signal, but is also affected by frequency. That is, audio signals of the same intensity and different frequencies have different auditory perceptions for the human ear. Figure 9 is an acoustic equal loudness curve measured by the International Acoustic Standards Organization provided by the embodiment of the present application. The acoustic equal loudness curve is a curve describing the relationship between sound pressure loudness and frequency under equal loudness conditions, and is one of the important auditory characteristics. That is, what sound pressure level intensity does the audio signal at different frequencies need to reach in order to obtain consistent hearing loudness for the user. In order to illustrate the meaning of this curve, let’s take an example below, such as any equal loudness curve in Figure 9. It can be seen that for medium and low frequencies (below 1kHz), the lower the frequency, the sound pressure intensity (i.e. energy) required for equal loudness ), simply put, more energy is needed to give users the same hearing experience. For mid-to-high frequencies (above 1kHz), audio in different frequency bands has different acoustic auditory perception characteristics. In this case, the calculation process of auditory perception weighted energy can be:

1)分帧加窗:1) Framing and windowing:

对于输入的音频信号(例如本申请实施例的差值信号或原始高频音频信号帧)通常使用20ms为一帧(与编码器帧定义一致)的分析窗,窗函数可以选用汉宁窗或汉明窗。For the input audio signal (such as the difference signal or the original high-frequency audio signal frame in the embodiment of the present application), an analysis window of 20 ms is usually used for one frame (consistent with the encoder frame definition). The window function can choose Hanning window or Hanning window. Bright windows.

2)功率谱计算:2) Power spectrum calculation:

对加窗分帧后得到的音频信号做傅里叶变换,并求出第i帧各频点的能量p(i,j),j=0~K-1,其中,K为总频点数。Perform Fourier transform on the audio signal obtained after windowing and framing, and find the energy p(i,j) of each frequency point in the i-th frame, j=0~K-1, where K is the total number of frequency points.

3)计算听觉感知加权能量:3) Calculate auditory perception weighted energy:

将每个频点k的能量乘以不同的听觉感知加权系数后进行累加得到的本帧音频信号的听觉感知加权能量值,计算公式如下:
The energy of each frequency point k is multiplied by different auditory perception weighting coefficients and then accumulated to obtain the auditory perception weighted energy value of the audio signal of this frame. The calculation formula is as follows:

其中,EP(i)为第i帧音频信号的听觉感知加权能量,i为帧序号,k为频点序号,cof(k)为第k个频点的听觉感知加权系数。Among them, EP(i) is the auditory perception weighted energy of the i-th frame audio signal, i is the frame number, k is the frequency point number, and cof(k) is the auditory perception weighting coefficient of the k-th frequency point.

这样,当第i帧音频信号为当前的原始高频音频信号帧时,计算得到的EP(i)表示为原始能量EP_s(i);当第i帧音频信号为对应的差值信号时,计算得到的EP(i)表示为差值能量EP_err(i)。In this way, when the i-th frame audio signal is the current original high-frequency audio signal frame, the calculated EP(i) is expressed as the original energy EP_s(i); when the i-th frame audio signal is the corresponding difference signal, the calculated The obtained EP(i) is expressed as the difference energy EP_err(i).

对于听觉感知加权系数,本申请实施例采用的是基于BS3383标准的心理声学等响曲线数据计算得到,计算公式如下所示:
cof(freq)=(10^loud/20)/1000    (3)
For the auditory perception weighting coefficient, the embodiment of the present application uses psychoacoustic equal loudness curve data based on the BS3383 standard to calculate it. The calculation formula is as follows:
cof(freq)=(10^loud/20)/1000 (3)

其中,freq表示频点,cof(freq)相当于第k个频点的听觉感知加权系数,loud表示频点freq的响度值。Among them, freq represents the frequency point, cof(freq) is equivalent to the auditory perception weighting coefficient of the kth frequency point, and loud represents the loudness value of the frequency point freq.

需要说明的是,频点freq的响度值loud可以通过以下公式进行计算:
loud=4.2+afy*(dB-cfy)/(1+bfy*(dB-cfy))   (4)
afy=af(j-1)+(freq-ff(j-1))*(af(j)-af(j-1))/(ff(j)-ff(j-1))   (5)
bfy=bf(j-1)+(freq-ff(j-1))*(bf(j)-bf(j-1))/(ff(j)-ff(j-1))   (6)
cfy=cf(j-1)+(freq-ff(j-1))*(cf(j)-cf(j-1))/(ff(j)-ff(j-1))   (7)
It should be noted that the loudness value loud of frequency point freq can be calculated by the following formula:
loud=4.2+afy*(dB-cfy)/(1+bfy*(dB-cfy)) (4)
afy=af(j-1)+(freq-ff(j-1))*(af(j)-af(j-1))/(ff(j)-ff(j-1)) (5)
bfy=bf(j-1)+(freq-ff(j-1))*(bf(j)-bf(j-1))/(ff(j)-ff(j-1)) (6)
cfy=cf(j-1)+(freq-ff(j-1))*(cf(j)-cf(j-1))/(ff(j)-ff(j-1)) (7)

其中,ff、af、bf、cf对应BS3383标准中公开的等响曲线数据表内的数据,是可以通过等响曲线数据表查询得到的,j为等响曲线数据表中的编号,freq是需要计算响度值loud的频点,其响度值loud计算是采用线性插值法对等响曲线数据表内的数据进行插值得到的。Among them, ff, af, bf, cf correspond to the data in the equal loudness curve data table disclosed in the BS3383 standard, which can be obtained through the equal loudness curve data table query, j is the number in the equal loudness curve data table, and freq is required Calculate the frequency point of the loudness value loud. The loudness value loud is calculated by using the linear interpolation method to interpolate the data in the equal loudness curve data table.

可以理解的是,通过上述公式计算响度值的freq通常是j-1与j之间的编号所对应的频点。基于上述公式计算得到的听觉感知加权系数图可以参见图10所示,体现了不同频点对应的听觉感知加权系数。It can be understood that the freq of the loudness value calculated through the above formula is usually the frequency point corresponding to the number between j-1 and j. The auditory perception weighting coefficient diagram calculated based on the above formula can be seen in Figure 10, which reflects the auditory perception weighting coefficient corresponding to different frequency points.

本申请实施例还提供一种高频音频信号的解码方法,该方法为从接收端的角度进行介绍的,参见图11,所述方法包括:Embodiments of the present application also provide a method for decoding high-frequency audio signals. This method is introduced from the perspective of the receiving end. See Figure 11. The method includes:

S1101、接收发送端发送的高频码流,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式。S1101. Receive the high-frequency code stream sent by the transmitting end. The high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream.

S1102、解析得到所述高频码流对应的编码标识,并确定所述编码标识所指示的编码方式。S1102. Analyze and obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier.

S1103、根据所述编码方式对应的解码方式对所述高频码流进行解码,得到高频音频信号帧。S1103. Decode the high-frequency code stream according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.

接收端接收到高频码流后,可以解析得到高频码流和对应的编码标识,进而根据编码标识所指示的编码方式对应的解码方式对高频码流进行解码,得到高频音频信号帧。After the receiving end receives the high-frequency code stream, it can parse the high-frequency code stream and the corresponding encoding identifier, and then decode the high-frequency code stream according to the decoding method corresponding to the encoding method indicated by the encoding identifier to obtain the high-frequency audio signal frame. .

由上述技术方案可以看出,本申请针对原始音频信号的原始高频音频信号,提出一种基于编码误差判决的多种编码方式混合的高频音频信号编解码方法,具体的,针对原始音频信号中的每个原始音频信号帧,获取从原始音频信号帧中分解得到的原始高频音频信号帧以及多种编码方式,编码方式具有对应的优先级,编码方式的优先级用于指示使用该编码方式进行编码的优先顺序,通常情况下,为了尽量降低音频信号传输的带宽,按照优先级从高到低的顺序,编码方式的编码比特数量递增。然后根据编码方式的优先级,从多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对原始高频音频信号帧进行编码产生的,从而可以以编码误差为判别标准,以编码比特数量最优为目标确定出目标编码方式,将利用目标编码方式对原始高频音频信号帧进行编码得到的高频码流发送至接收端,从而在编码质量允许的情况下,选择使用编码比特数量小的编码方式,降低了音频信号传输的带宽。由于高频码流具有编码标识,编码标识用于指示编码得到高频码流所使用的编码方式,以便解码端可以根据编码标识确定使用哪种编码方式对接收到的高频码流进行解码。可见,本申请可以在编码质量允许的情况下,选择使用编码比特数量小的编码方式,实现在编码比特数量和编码质量上都达到比较满意的效果,具有更低的编码比特数量和优质的音频。It can be seen from the above technical solution that this application proposes a high-frequency audio signal encoding and decoding method based on the mixing of multiple encoding methods based on coding error judgment for the original high-frequency audio signal. Specifically, for the original audio signal For each original audio signal frame in, obtain the original high-frequency audio signal frame decomposed from the original audio signal frame and multiple encoding methods. The encoding methods have corresponding priorities. The priority of the encoding method is used to indicate the use of this encoding. The priority of encoding method. Usually, in order to reduce the bandwidth of audio signal transmission as much as possible, the number of encoding bits of the encoding method increases in order from high to low priority. Then according to the priority of the coding method, the coding method with coding error within the error preset interval is determined from multiple coding methods as the target coding method. The coding error of the coding method is generated by encoding the original high-frequency audio signal frame using the coding method. , so that the coding error can be used as the criterion, the target coding method can be determined with the optimal number of coding bits as the goal, and the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target coding method is sent to the receiving end. Therefore, when the encoding quality permits, the encoding method with a small number of encoding bits is selected, which reduces the bandwidth of audio signal transmission. Since the high-frequency code stream has an encoding identifier, the encoding identifier is used to indicate the encoding method used to encode the high-frequency code stream, so that the decoding end can determine which encoding method to use to decode the received high-frequency code stream based on the encoding identifier. It can be seen that this application can choose to use a coding method with a small number of coded bits when the coding quality permits, to achieve relatively satisfactory results in both the number of coded bits and coding quality, with a lower number of coded bits and high-quality audio. .

本申请还提供一种高频音频信号的编解码方法,该方法从发送端和接收端整体架构角度进行介绍。本申请实施例以多种编码方式包括SSR方式、SBR方式和CELP编码方式,优 先级从高到低依次是SSR方式、SBR方式和CELP编码方式为例,高频音频信号的编解码的整体实现架构可以参见图12所示。This application also provides a coding and decoding method for high-frequency audio signals, which is introduced from the perspective of the overall architecture of the transmitter and receiver. The embodiments of the present application use multiple coding methods including SSR method, SBR method and CELP coding method, which has the advantages of The priorities from high to low are SSR mode, SBR mode and CELP encoding mode as an example. The overall implementation architecture of encoding and decoding high-frequency audio signals can be seen in Figure 12.

其中,输入原始高频音频信号帧和原始低频音频信号帧(参见图12中1201所示),原始高频音频信号帧和原始低频音频信号帧是原始音频信号帧经过高低频分解(例如通过QMF滤波器组分解)得到的,原始低频音频信号帧可以用于后续高频重建。Among them, the original high-frequency audio signal frame and the original low-frequency audio signal frame are input (see 1201 in Figure 12). The original high-frequency audio signal frame and the original low-frequency audio signal frame are the original audio signal frames after high and low frequency decomposition (for example, through QMF Filter bank decomposition), the original low-frequency audio signal frame can be used for subsequent high-frequency reconstruction.

在高频音频编码环节中,首先尝试通过SSR方式进行编码得到高频码流(参见图12中1202所示),然后进行高频重建(参见图12中1203所示),基于高频重建出来的高频重建信号帧和原始高频音频信号帧,确定编码误差是否在误差预设区间(参见图12中1204所示),若是,则执行将高频码流发送至接收端(参见图12中1209所示)的步骤,若否,则尝试通过SBR方式进行编码得到高频码流(参见图12中1205所示),然后进行高频重建(参见图12中1206所示),确定编码误差是否在误差预设区间(参见图12中1207所示),若是,则执行将高频码流发送至接收端(参见图12中1209所示)的步骤,若否,则继续尝试CELP编码方式(参见图12中1208所示),执行将高频码流发送至接收端(参见图12中1209所示)的步骤,其中,高频码流具有对应的编码标识。在高频音频解码环节,解析得到高频码流和编码标识(参见图12中1210所示),利用编码标识所指示的编码方式对应的解码方式对高频码流进行解码(参见图12中1211所示),经过上述流程,解码得到高频音频信号帧(参见图12中1212所示)。In the high-frequency audio coding process, first try to encode through SSR to obtain the high-frequency code stream (see 1202 in Figure 12), and then perform high-frequency reconstruction (see 1203 in Figure 12), and reconstruct it based on the high frequency The high-frequency reconstructed signal frame and the original high-frequency audio signal frame are used to determine whether the coding error is within the error preset interval (see 1204 in Figure 12). If so, send the high-frequency code stream to the receiving end (see Figure 12 (shown as 1209 in Figure 12), if not, try to encode through SBR to obtain the high-frequency code stream (see 1205 in Figure 12), and then perform high-frequency reconstruction (see 1206 in Figure 12) to determine the encoding Whether the error is within the error preset interval (see 1207 in Figure 12), if so, perform the steps of sending the high-frequency code stream to the receiving end (see 1209 in Figure 12), if not, continue to try CELP encoding method (see 1208 in Figure 12), perform the step of sending the high-frequency code stream to the receiving end (see 1209 in Figure 12), where the high-frequency code stream has a corresponding encoding identifier. In the high-frequency audio decoding process, the high-frequency code stream and encoding identifier are obtained through analysis (see 1210 in Figure 12), and the high-frequency code stream is decoded using the decoding method corresponding to the encoding method indicated by the encoding identifier (see Figure 12) (shown as 1211 in Figure 12), after the above process, the high-frequency audio signal frame (see 1212 in Figure 12) is obtained by decoding.

需要说明的是,本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。It should be noted that, based on the implementation methods provided in the above aspects, this application can also be further combined to provide more implementation methods.

基于图2对应实施例提供的高频音频信号的编码方法,本申请实施例还提供一种高频音频信号的编码装置1300。参见图13,所述高频音频信号的编码装置1300包括获取单元1301、确定单元1302和发送单元1303:Based on the encoding method of high-frequency audio signals provided in the corresponding embodiment of Figure 2, embodiments of the present application also provide a high-frequency audio signal encoding device 1300. Referring to Figure 13, the high-frequency audio signal encoding device 1300 includes an acquisition unit 1301, a determination unit 1302 and a sending unit 1303:

所述获取单元1301,用于获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧;The acquisition unit 1301 is used to acquire multiple encoding methods and acquire original high-frequency audio signal frames decomposed from original audio signal frames;

所述获取单元1301,还用于获取所述多种编码方式分别对应的优先级,按照所述优先级从高到低的顺序,编码方式的编码比特数量递增;The obtaining unit 1301 is also used to obtain the priorities corresponding to the multiple coding methods. According to the order of the priorities from high to low, the number of coded bits of the coding method increases;

所述确定单元1302,用于根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对所述原始高频音频信号帧进行编码产生的;The determination unit 1302 is configured to determine, from the plurality of coding methods, a coding method with a coding error within a preset error interval as a target coding method according to the priority of the coding method. The coding error of the coding method is determined by using the coding method. The original high-frequency audio signal frame is generated by encoding;

所述发送单元1303,用于将利用所述目标编码方式对所述原始高频音频信号帧进行编码得到的高频码流发送至接收端,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式。The sending unit 1303 is configured to send a high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to the receiving end. The high-frequency code stream has a coding identifier, and the coding The identifier is used to indicate the encoding method used to obtain the high-frequency code stream.

在一种可能的实现方式中,所述确定单元1302,具体用于:In a possible implementation, the determining unit 1302 is specifically used to:

按照所述优先级从高到低的顺序,依次从所述多种编码方式中选择待定编码方式;Select the undetermined encoding method from the plurality of encoding methods in order from high to low priority;

确定所述待定编码方式的编码误差;Determine the coding error of the undetermined coding method;

若所述待定编码方式的编码误差在所述误差预设区间内,将所述待定编码方式确定为所述目标编码方式,停止继续选择待定编码方式。 If the coding error of the undetermined encoding method is within the error preset interval, the undetermined encoding method is determined as the target encoding method, and the continued selection of the undetermined encoding method is stopped.

在一种可能的实现方式中,所述确定单元1302,具体用于:In a possible implementation, the determining unit 1302 is specifically used to:

分别确定所述多种编码方式中每种编码方式的编码误差;Determine the coding error of each coding method in the plurality of coding methods respectively;

从编码误差在所述误差预设区间内的编码方式中,确定优先级最高的编码方式作为所述目标编码方式。From the coding methods with coding errors within the error preset interval, the coding method with the highest priority is determined as the target coding method.

在一种可能的实现方式中,对于所述多种编码方式的任一编码方式,所述装置还包括重建单元和误差分析单元:In a possible implementation, for any of the multiple coding methods, the device further includes a reconstruction unit and an error analysis unit:

所述获取单元1301,还用于获取从所述原始音频信号帧中分解得到的原始低频音频信号帧;The acquisition unit 1301 is also used to acquire the original low-frequency audio signal frame decomposed from the original audio signal frame;

所述重建单元,用于根据所述原始低频音频信号帧,利用所述任一编码方式进行高频重建,得到高频重建信号帧;The reconstruction unit is configured to perform high-frequency reconstruction using any of the encoding methods according to the original low-frequency audio signal frame to obtain a high-frequency reconstructed signal frame;

所述误差分析单元,用于基于所述高频重建信号帧和所述原始高频音频信号帧进行误差分析,得到对应的编码误差。The error analysis unit is configured to perform error analysis based on the high-frequency reconstructed signal frame and the original high-frequency audio signal frame to obtain the corresponding coding error.

在一种可能的实现方式中,所述误差分析单元,具体用于:In a possible implementation, the error analysis unit is specifically used to:

计算所述高频重建信号帧和所述原始高频音频信号帧之间的差值信号;Calculate a difference signal between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame;

利用所述差值信号确定所述编码误差。The coding error is determined using the difference signal.

在一种可能的实现方式中,所述误差分析单元,具体用于:In a possible implementation, the error analysis unit is specifically used to:

将所述差值信号作为所述编码误差;Use the difference signal as the coding error;

或者,or,

对所述差值信号进行听觉感知加权能量计算得到差值能量,以及对所述原始高频音频信号帧进行听觉感知加权能量计算得到原始能量;Perform auditory perception weighted energy calculation on the difference signal to obtain the difference energy, and perform auditory perception weighted energy calculation on the original high-frequency audio signal frame to obtain the original energy;

将所述差值能量和所述原始能量的比值作为所述编码误差。The ratio of the difference energy to the original energy is used as the coding error.

在一种可能的实现方式中,若所述任一编码方式为音频超分辨率方式,所述重建单元,具体用于:In a possible implementation, if any of the encoding methods is an audio super-resolution method, the reconstruction unit is specifically used to:

获取所述音频超分辨率方式对应的神经网络模型;Obtain the neural network model corresponding to the audio super-resolution method;

对所述原始低频音频信号帧进行特征提取,得到低频特征;Perform feature extraction on the original low-frequency audio signal frame to obtain low-frequency features;

根据所述低频特征,通过所述神经网络模型进行预测得到所述高频重建信号帧。According to the low-frequency characteristics, the high-frequency reconstructed signal frame is obtained by predicting through the neural network model.

在一种可能的实现方式中,若所述任一编码方式为频带复制方式,所述重建单元,具体用于:In a possible implementation, if any of the coding methods is a frequency band replication method, the reconstruction unit is specifically used to:

将所述原始低频音频信号帧复制到高频频段,得到高频复制信号帧;Copy the original low-frequency audio signal frame to the high-frequency band to obtain a high-frequency copied signal frame;

提取所述原始高频音频信号帧的包络特征;Extract envelope features of the original high-frequency audio signal frame;

利用所述包络特征对所述高频复制信号帧进行校正,得到所述高频重建信号帧。The high-frequency replica signal frame is corrected using the envelope feature to obtain the high-frequency reconstructed signal frame.

在一种可能的实现方式中,若所述任一编码方式为码激励线性预测编码方式,所述重建单元,具体用于:In a possible implementation, if any of the coding methods is a code-excited linear prediction coding method, the reconstruction unit is specifically used to:

从所述高频码流中获取编码参数,以及获取所述原始低频音频信号帧的基音周期;Obtain coding parameters from the high-frequency code stream, and obtain the pitch period of the original low-frequency audio signal frame;

根据所述编码参数和所述基音周期进行高频重建,得到所述高频重建信号帧。High-frequency reconstruction is performed according to the coding parameters and the pitch period to obtain the high-frequency reconstructed signal frame.

基于图11对应实施例提供的高频音频信号的解码方法,本申请实施例还提供了一种高频音频信号的解码装置1400。参见图14,所述高频音频信号的解码装置1400包括接收单元 1401、解析单元1402和解码单元1403:Based on the decoding method of high-frequency audio signals provided in the corresponding embodiment of Figure 11, embodiments of the present application also provide a device 1400 for decoding high-frequency audio signals. Referring to Figure 14, the high-frequency audio signal decoding device 1400 includes a receiving unit 1401, parsing unit 1402 and decoding unit 1403:

所述接收单元1401,用于接收发送端发送的高频码流,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式;The receiving unit 1401 is used to receive the high-frequency code stream sent by the transmitting end. The high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream;

所述解析单元1402,用于解析得到所述高频码流对应的编码标识,并确定所述编码标识所指示的编码方式;The analysis unit 1402 is used to analyze and obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier;

所述解码单元1403,用于根据所述编码方式对应的解码方式对所述高频码流进行解码,得到高频音频信号帧。The decoding unit 1403 is configured to decode the high-frequency code stream according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.

本申请实施例还提供了一种计算机设备,该计算机设备可以执行高频音频信号的编解码方法。该计算机设备例如可以是终端,以终端为智能手机为例:An embodiment of the present application also provides a computer device that can execute a coding and decoding method for high-frequency audio signals. The computer device may be, for example, a terminal. Taking the terminal as a smartphone as an example:

图15示出的是与本申请实施例提供的智能手机的部分结构的框图。参考图15,智能手机包括:射频(英文全称:Radio Frequency,英文缩写:RF)电路1510、存储器1520、输入单元1530、显示单元1540、传感器1550、音频电路1560、无线保真(英文缩写:WiFi)模块1570、处理器1580、以及电源1590等部件。输入单元1530可包括触控面板1531以及其他输入设备1532,显示单元1540可包括显示面板1541,音频电路1560可以包括扬声器1561和传声器1562。可以理解的是,图15中示出的智能手机结构并不构成对智能手机的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。FIG. 15 shows a block diagram of a partial structure of a smart phone provided by an embodiment of the present application. Referring to Figure 15, the smartphone includes: radio frequency (English full name: Radio Frequency, English abbreviation: RF) circuit 1510, memory 1520, input unit 1530, display unit 1540, sensor 1550, audio circuit 1560, wireless fidelity (English abbreviation: WiFi ) module 1570, processor 1580, and power supply 1590 and other components. The input unit 1530 may include a touch panel 1531 and other input devices 1532, the display unit 1540 may include a display panel 1541, and the audio circuit 1560 may include a speaker 1561 and a microphone 1562. It can be understood that the structure of the smart phone shown in Figure 15 does not constitute a limitation to the smart phone, and may include more or less components than shown in the figure, or combine certain components, or arrange different components.

存储器1520可用于存储软件程序以及模块,处理器1580通过运行存储在存储器1520的软件程序以及模块,从而执行智能手机的各种功能应用以及数据处理。存储器1520可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据智能手机的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 1520 can be used to store software programs and modules. The processor 1580 executes various functional applications and data processing of the smart phone by running the software programs and modules stored in the memory 1520 . The memory 1520 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store data based on Data created by the use of smartphones (such as audio data, phone books, etc.), etc. In addition, memory 1520 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

处理器1580是智能手机的控制中心,利用各种接口和线路连接整个智能手机的各个部分,通过运行或执行存储在存储器1520内的软件程序和/或模块,以及调用存储在存储器1520内的数据,执行智能手机的各种功能和处理数据。可选的,处理器1580可包括一个或多个处理单元;优选的,处理器1580可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器1580中。The processor 1580 is the control center of the smartphone, using various interfaces and lines to connect various parts of the entire smartphone, by running or executing software programs and/or modules stored in the memory 1520, and calling data stored in the memory 1520 , perform various functions of the smartphone and process data. Optionally, the processor 1580 may include one or more processing units; preferably, the processor 1580 may integrate an application processor and a modem processor, where the application processor mainly processes operating systems, user interfaces, application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 1580.

在本实施例中,智能手机中的处理器1580可以执行以下步骤:In this embodiment, the processor 1580 in the smartphone can perform the following steps:

获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧;Obtain multiple encoding methods, and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame;

获取所述多种编码方式分别对应的优先级,按照所述优先级从高到低的顺序,编码方式的编码比特数量递增;Obtain the priorities corresponding to the multiple encoding methods, and in the order of the priorities from high to low, the number of encoding bits of the encoding method increases;

根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对所述原始高频音频信号帧进行编码产生的;According to the priority of the coding method, the coding method whose coding error is within the error preset interval is determined from the multiple coding methods as the target coding method. The coding error of the coding method is the encoding method of the original high-frequency audio signal frame. Generated by coding;

将利用所述目标编码方式对所述原始高频音频信号帧进行编码得到的高频码流发送至 接收端,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式。Send the high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to At the receiving end, the high-frequency code stream has an encoding identifier, and the encoding identifier is used to indicate the encoding method used to obtain the high-frequency code stream.

或,处理器1580可以执行以下步骤:Alternatively, processor 1580 may perform the following steps:

接收发送端发送的高频码流,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式;Receive the high-frequency code stream sent by the transmitting end, the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream;

解析得到所述高频码流对应的编码标识,并确定所述编码标识所指示的编码方式;Analyze to obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier;

根据所述编码方式对应的解码方式对所述高频码流进行解码,得到高频音频信号帧。The high-frequency code stream is decoded according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame.

本申请实施例还提供一种服务器,请参见图16所示,图16为本申请实施例提供的服务器1600的结构图,服务器1600可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(Central Processing Units,简称CPU)1622(例如,一个或一个以上处理器)和存储器1632,一个或一个以上存储应用程序1642或数据1644的存储介质1630(例如一个或一个以上海量存储设备)。其中,存储器1632和存储介质1630可以是短暂存储或持久存储。存储在存储介质1630的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对服务器中的一系列指令操作。更进一步地,中央处理器1622可以设置为与存储介质1630通信,在服务器1600上执行存储介质1630中的一系列指令操作。The embodiment of the present application also provides a server, as shown in Figure 16. Figure 16 is a structural diagram of the server 1600 provided by the embodiment of the present application. The server 1600 may vary greatly due to different configurations or performance, and may include a Or more than one central processing unit (Central Processing Units, CPU for short) 1622 (for example, one or more processors) and memory 1632, one or more storage media 1630 (for example, one or more storage media 1630 for storing application programs 1642 or data 1644 mass storage device). Among them, the memory 1632 and the storage medium 1630 may be short-term storage or persistent storage. The program stored in the storage medium 1630 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the server. Furthermore, the central processor 1622 may be configured to communicate with the storage medium 1630 and execute a series of instruction operations in the storage medium 1630 on the server 1600 .

服务器1600还可以包括一个或一个以上电源1626,一个或一个以上有线或无线网络接口1650,一个或一个以上输入输出接口1658,和/或,一个或一个以上操作系统1641,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。Server 1600 may also include one or more power supplies 1626, one or more wired or wireless network interfaces 1650, one or more input and output interfaces 1658, and/or, one or more operating systems 1641, such as Windows Server , Mac OS X TM , Unix TM , Linux TM , FreeBSD TM and so on.

在本实施例中,由服务器1600中的中央处理器1622执行的步骤可以基于图16所示的结构实现。In this embodiment, the steps performed by the central processor 1622 in the server 1600 can be implemented based on the structure shown in FIG. 16 .

根据本申请的一个方面,提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序用于执行前述各个实施例所述的高频音频信号的编解码方法。According to one aspect of the present application, a computer-readable storage medium is provided. The computer-readable storage medium is used to store a computer program. The computer program is used to perform the encoding of high-frequency audio signals described in the foregoing embodiments. Decoding method.

根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述实施例各种可选实现方式中提供的方法。According to one aspect of the present application, a computer program product or computer program is provided, which computer program product or computer program includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in various optional implementations of the above embodiments.

上述各个附图对应的流程或结构的描述各有侧重,某个流程或结构中没有详述的部分,可以参见其他流程或结构的相关描述。The descriptions of the processes or structures corresponding to each of the above drawings have different emphasis. For parts that are not described in detail in a certain process or structure, please refer to the relevant descriptions of other processes or structures.

本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if present) in the description of this application and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe specific objects. Sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the application described herein can, for example, be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通 过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be achieved through other means. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which may be a computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc., which can store program code. medium.

以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术成员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。 As mentioned above, the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it. Although the present application has been described in detail with reference to the foregoing embodiments, members of ordinary skill in the art should understand that they can still modify the foregoing. The technical solutions described in each embodiment may be modified, or some of the technical features may be equivalently replaced; however, these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions in each embodiment of the present application.

Claims (15)

一种高频音频信号的编码方法,所述方法由计算机设备执行,所述方法包括:A method for encoding high-frequency audio signals, the method is executed by a computer device, the method includes: 获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧;Obtain multiple encoding methods, and obtain the original high-frequency audio signal frame decomposed from the original audio signal frame; 获取所述多种编码方式分别对应的优先级,按照所述优先级从高到低的顺序,编码方式的编码比特数量递增;Obtain the priorities corresponding to the multiple encoding methods, and in the order of the priorities from high to low, the number of encoding bits of the encoding method increases; 根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对所述原始高频音频信号帧进行编码产生的;According to the priority of the coding method, the coding method whose coding error is within the error preset interval is determined from the multiple coding methods as the target coding method. The coding error of the coding method is the encoding method of the original high-frequency audio signal frame. Generated by coding; 将利用所述目标编码方式对所述原始高频音频信号帧进行编码得到的高频码流发送至接收端,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式。The high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method is sent to the receiving end. The high-frequency code stream has an encoding identifier, and the encoding identifier is used to indicate that the encoding obtained by The encoding method used for high-frequency code streams. 根据权利要求1所述的方法,所述根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,包括:The method according to claim 1, wherein the coding method in which the coding error is within a preset error interval is determined from the plurality of coding methods as the target coding method according to the priority of the coding method, including: 按照所述优先级从高到低的顺序,依次从所述多种编码方式中选择待定编码方式;Select the undetermined encoding method from the plurality of encoding methods in order from high to low priority; 确定所述待定编码方式的编码误差;Determine the coding error of the undetermined coding method; 若所述待定编码方式的编码误差在所述误差预设区间内,将所述待定编码方式确定为所述目标编码方式,停止继续选择待定编码方式。If the coding error of the undetermined encoding method is within the error preset interval, the undetermined encoding method is determined as the target encoding method, and the continued selection of the undetermined encoding method is stopped. 根据权利要求1所述的方法,所述根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,包括:The method according to claim 1, wherein the coding method in which the coding error is within a preset error interval is determined from the plurality of coding methods as the target coding method according to the priority of the coding method, including: 分别确定所述多种编码方式中每种编码方式的编码误差;Determine the coding error of each coding method in the plurality of coding methods respectively; 从编码误差在所述误差预设区间内的编码方式中,确定优先级最高的编码方式作为所述目标编码方式。From the coding methods with coding errors within the error preset interval, the coding method with the highest priority is determined as the target coding method. 根据权利要求1所述的方法,对于所述多种编码方式的任一编码方式,所述任一编码方式的编码误差的确定方式包括:The method according to claim 1, for any one of the plurality of coding methods, the method for determining the coding error of any one of the coding methods includes: 获取从所述原始音频信号帧中分解得到的原始低频音频信号帧;Obtain an original low-frequency audio signal frame decomposed from the original audio signal frame; 根据所述原始低频音频信号帧,利用所述任一编码方式进行高频重建,得到高频重建信号帧;According to the original low-frequency audio signal frame, use any of the encoding methods to perform high-frequency reconstruction to obtain a high-frequency reconstructed signal frame; 基于所述高频重建信号帧和所述原始高频音频信号帧进行误差分析,得到对应的编码误差。Error analysis is performed based on the high-frequency reconstructed signal frame and the original high-frequency audio signal frame to obtain the corresponding coding error. 根据权利要求4所述的方法,所述基于所述高频重建信号帧和所述原始高频音频信号帧进行误差分析,得到所述编码误差,包括:The method according to claim 4, wherein performing error analysis based on the high-frequency reconstructed signal frame and the original high-frequency audio signal frame to obtain the encoding error includes: 计算所述高频重建信号帧和所述原始高频音频信号帧之间的差值信号;Calculate a difference signal between the high-frequency reconstructed signal frame and the original high-frequency audio signal frame; 利用所述差值信号确定所述编码误差。The coding error is determined using the difference signal. 根据权利要求5所述的方法,所述利用所述差值确定所述编码误差,包括:The method according to claim 5, using the difference value to determine the encoding error includes: 将所述差值信号作为所述编码误差;Use the difference signal as the coding error; 或者,or, 对所述差值信号进行听觉感知加权能量计算得到差值能量,以及对所述原始高频音频 信号帧进行听觉感知加权能量计算得到原始能量;Perform auditory perception weighted energy calculation on the difference signal to obtain the difference energy, and calculate the original high-frequency audio The signal frame is subjected to auditory perception weighted energy calculation to obtain the original energy; 将所述差值能量和所述原始能量的比值作为所述编码误差。The ratio of the difference energy to the original energy is used as the encoding error. 根据权利要求4-6任一项所述的方法,若所述任一编码方式为音频超分辨率方式,所述根据所述原始低频音频信号帧,利用所述任一编码方式进行高频重建,得到高频重建信号帧,包括:The method according to any one of claims 4 to 6, if any coding method is an audio super-resolution method, using any coding method to perform high-frequency reconstruction based on the original low-frequency audio signal frame , obtain the high-frequency reconstructed signal frame, including: 获取所述音频超分辨率方式对应的神经网络模型;Obtain the neural network model corresponding to the audio super-resolution method; 对所述原始低频音频信号帧进行特征提取,得到低频特征;Perform feature extraction on the original low-frequency audio signal frame to obtain low-frequency features; 根据所述低频特征,通过所述神经网络模型进行预测得到所述高频重建信号帧。According to the low-frequency characteristics, the high-frequency reconstructed signal frame is obtained by predicting through the neural network model. 根据权利要求4-6任一项所述的方法,若所述任一编码方式为频带复制方式,所述根据所述原始低频音频信号帧,利用所述任一编码方式进行高频重建,得到高频重建信号帧,包括:The method according to any one of claims 4 to 6, if any of the coding methods is a frequency band replication method, and using any of the coding methods to perform high-frequency reconstruction based on the original low-frequency audio signal frame, we obtain High-frequency reconstructed signal frames, including: 将所述原始低频音频信号帧复制到高频频段,得到高频复制信号帧;Copy the original low-frequency audio signal frame to the high-frequency band to obtain a high-frequency copied signal frame; 提取所述原始高频音频信号帧的包络特征;Extract envelope features of the original high-frequency audio signal frame; 利用所述包络特征对所述高频复制信号帧进行校正,得到所述高频重建信号帧。The high-frequency replica signal frame is corrected using the envelope feature to obtain the high-frequency reconstructed signal frame. 根据权利要求4-6任一项所述的方法,若所述任一编码方式为码激励线性预测编码方式,所述根据所述原始低频音频信号帧,利用所述任一编码方式进行高频重建,得到高频重建信号帧,包括:The method according to any one of claims 4 to 6, if any of the coding methods is a code-excited linear prediction coding method, and based on the original low-frequency audio signal frame, using any of the coding methods to perform high-frequency Reconstruct to obtain high-frequency reconstructed signal frames, including: 从所述高频码流中获取编码参数,以及获取所述原始低频音频信号帧的基音周期;Obtain coding parameters from the high-frequency code stream, and obtain the pitch period of the original low-frequency audio signal frame; 根据所述编码参数和所述基音周期进行高频重建,得到所述高频重建信号帧。High-frequency reconstruction is performed according to the coding parameters and the pitch period to obtain the high-frequency reconstructed signal frame. 一种高频音频信号的解码方法,所述方法由计算机设备执行,所述方法包括:A method for decoding high-frequency audio signals, the method is executed by a computer device, the method includes: 接收发送端发送的高频码流,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式;Receive the high-frequency code stream sent by the transmitting end, the high-frequency code stream has a coding identifier, and the coding identifier is used to indicate the coding method used to encode the high-frequency code stream; 解析得到所述高频码流对应的编码标识,并确定所述编码标识所指示的编码方式;Analyze to obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier; 根据所述编码方式对应的解码方式对所述高频码流进行解码,得到高频音频信号帧。The high-frequency code stream is decoded according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame. 一种高频音频信号的编码装置,所述装置包括获取单元、确定单元和发送单元:A high-frequency audio signal encoding device, the device includes an acquisition unit, a determination unit and a sending unit: 所述获取单元,用于获取多种编码方式,以及获取从原始音频信号帧中分解得到的原始高频音频信号帧;The acquisition unit is used to acquire multiple encoding methods and acquire original high-frequency audio signal frames decomposed from original audio signal frames; 所述获取单元,还用于获取所述多种编码方式分别对应的优先级,按照所述优先级从高到低的顺序,编码方式的编码比特数量递增;The acquisition unit is also used to acquire the priorities corresponding to the multiple encoding methods. According to the order of the priorities from high to low, the number of encoding bits of the encoding methods increases; 所述确定单元,用于根据编码方式的优先级,从所述多种编码方式中确定编码误差在误差预设区间内的编码方式作为目标编码方式,编码方式的编码误差是利用编码方式对所述原始高频音频信号帧进行编码产生的;The determination unit is configured to determine, from the plurality of coding methods, a coding method with a coding error within a preset error interval as a target coding method according to the priority of the coding method. The coding error of the coding method is determined by using the coding method. It is generated by encoding the original high-frequency audio signal frame; 所述发送单元,用于将利用所述目标编码方式对所述原始高频音频信号帧进行编码得到的高频码流发送至接收端,所述高频码流具有编码标识,所述编码标识用于指示编码得到所述高频码流所使用的编码方式。The sending unit is configured to send a high-frequency code stream obtained by encoding the original high-frequency audio signal frame using the target encoding method to the receiving end. The high-frequency code stream has a coding identifier, and the coding identifier Used to indicate the encoding method used to obtain the high-frequency code stream. 一种高频音频信号的解码装置,所述装置包括接收单元、解析单元和解码单元:A decoding device for high-frequency audio signals, the device includes a receiving unit, an analysis unit and a decoding unit: 所述接收单元,用于接收发送端发送的高频码流,所述高频码流具有编码标识,所述 编码标识用于指示编码得到所述高频码流所使用的编码方式;The receiving unit is used to receive the high-frequency code stream sent by the transmitting end. The high-frequency code stream has a coding identifier. The encoding identifier is used to indicate the encoding method used to obtain the high-frequency code stream; 所述解析单元,用于解析得到所述高频码流对应的编码标识,并确定所述编码标识所指示的编码方式;The analysis unit is used to analyze and obtain the encoding identifier corresponding to the high-frequency code stream, and determine the encoding method indicated by the encoding identifier; 所述解码单元,用于根据所述编码方式对应的解码方式对所述高频码流进行解码,得到高频音频信号帧。The decoding unit is configured to decode the high-frequency code stream according to the decoding method corresponding to the encoding method to obtain a high-frequency audio signal frame. 一种计算机设备,所述计算机设备包括处理器以及存储器:A computer device including a processor and a memory: 所述存储器用于存储计算机程序,并将所述计算机程序传输给所述处理器;The memory is used to store a computer program and transmit the computer program to the processor; 所述处理器用于根据所述计算机程序执行权利要求1-10任一项所述的方法。The processor is configured to execute the method according to any one of claims 1-10 according to the computer program. 一种计算机可读存储介质,所述计算机可读存储介质用于存储计算机程序,所述计算机程序当被处理器执行时使所述处理器执行权利要求1-10任一项所述的方法。A computer-readable storage medium used to store a computer program that, when executed by a processor, causes the processor to perform the method described in any one of claims 1-10. 一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现权利要求1-10任一项所述的方法。 A computer program product includes a computer program that implements the method described in any one of claims 1-10 when executed by a processor.
PCT/CN2023/081461 2022-04-15 2023-03-14 High-frequency audio signal encoding and decoding method and related apparatuses Ceased WO2023197809A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210395889.2 2022-04-15
CN202210395889.2A CN114550732B (en) 2022-04-15 2022-04-15 Coding and decoding method and related device for high-frequency audio signal

Publications (1)

Publication Number Publication Date
WO2023197809A1 true WO2023197809A1 (en) 2023-10-19

Family

ID=81666757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/081461 Ceased WO2023197809A1 (en) 2022-04-15 2023-03-14 High-frequency audio signal encoding and decoding method and related apparatuses

Country Status (2)

Country Link
CN (1) CN114550732B (en)
WO (1) WO2023197809A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550732B (en) * 2022-04-15 2022-07-08 腾讯科技(深圳)有限公司 Coding and decoding method and related device for high-frequency audio signal
CN115116451B (en) * 2022-06-15 2024-11-08 腾讯科技(深圳)有限公司 Audio decoding, encoding method, device, electronic device and storage medium
TWI865895B (en) * 2022-07-19 2024-12-11 盛微先進科技股份有限公司 Audio compression system and audio compression method for wireless communication
EP4664455A1 (en) * 2023-02-09 2025-12-17 Beijing Xiaomi Mobile Software Co., Ltd. Audio signal processing method, apparatus, device, and storage medium
CN120431944A (en) * 2024-02-02 2025-08-05 北京字跳网络技术有限公司 Decoding and encoding methods, devices, electronic devices, media, products and systems

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11133999A (en) * 1997-10-29 1999-05-21 Ricoh Co Ltd Audio encoding / decoding device
CN102074242A (en) * 2010-12-27 2011-05-25 武汉大学 Extraction system and method of core layer residual in speech audio hybrid scalable coding
CN113470667A (en) * 2020-03-11 2021-10-01 腾讯科技(深圳)有限公司 Voice signal coding and decoding method and device, electronic equipment and storage medium
CN114550732A (en) * 2022-04-15 2022-05-27 腾讯科技(深圳)有限公司 Coding and decoding method and related device for high-frequency audio signal

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101662288B (en) * 2008-08-28 2012-07-04 华为技术有限公司 Method, device and system for encoding and decoding audios
CN101710489B (en) * 2009-11-09 2011-11-30 清华大学 Method and device capable of encoding and decoding audio by grade and encoding and decoding system
EP3249647B1 (en) * 2010-12-29 2023-10-18 Samsung Electronics Co., Ltd. Apparatus and method for encoding for high-frequency bandwidth extension
MX353240B (en) * 2013-06-11 2018-01-05 Fraunhofer Ges Forschung Device and method for bandwidth extension for acoustic signals.
JP6863359B2 (en) * 2014-03-24 2021-04-21 ソニーグループ株式会社 Decoding device and method, and program
JP6439296B2 (en) * 2014-03-24 2018-12-19 ソニー株式会社 Decoding apparatus and method, and program
WO2019091576A1 (en) * 2017-11-10 2019-05-16 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio encoders, audio decoders, methods and computer programs adapting an encoding and decoding of least significant bits
IL319703A (en) * 2018-04-25 2025-05-01 Dolby Int Ab Integration of high frequency reconstruction techniques with reduced post-processing delay
WO2020185522A1 (en) * 2019-03-14 2020-09-17 Boomcloud 360, Inc. Spatially aware multiband compression system with priority
JP6691251B2 (en) * 2019-04-05 2020-04-28 株式会社Nttドコモ Speech decoding device, speech decoding method, and speech decoding program
US11380343B2 (en) * 2019-09-12 2022-07-05 Immersion Networks, Inc. Systems and methods for processing high frequency audio signal
CN112530444B (en) * 2019-09-18 2023-10-03 华为技术有限公司 Audio coding method and device
CN113593586B (en) * 2020-04-15 2025-01-10 华为技术有限公司 Audio signal encoding method, decoding method, encoding device and decoding device
CN112767954B (en) * 2020-06-24 2024-06-14 腾讯科技(深圳)有限公司 Audio encoding and decoding method, device, medium and electronic equipment
CN113963703B (en) * 2020-07-03 2025-05-02 华为技术有限公司 Audio encoding method and encoding and decoding device
CN114333861B (en) * 2021-11-18 2025-07-11 腾讯科技(深圳)有限公司 Audio processing method, device, storage medium, equipment and product

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH11133999A (en) * 1997-10-29 1999-05-21 Ricoh Co Ltd Audio encoding / decoding device
CN102074242A (en) * 2010-12-27 2011-05-25 武汉大学 Extraction system and method of core layer residual in speech audio hybrid scalable coding
CN113470667A (en) * 2020-03-11 2021-10-01 腾讯科技(深圳)有限公司 Voice signal coding and decoding method and device, electronic equipment and storage medium
CN114550732A (en) * 2022-04-15 2022-05-27 腾讯科技(深圳)有限公司 Coding and decoding method and related device for high-frequency audio signal

Also Published As

Publication number Publication date
CN114550732A (en) 2022-05-27
CN114550732B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
US11727946B2 (en) Method, apparatus, and system for processing audio data
US11526734B2 (en) Method and apparatus for recurrent auto-encoding
WO2023197809A1 (en) High-frequency audio signal encoding and decoding method and related apparatuses
CN105793924B (en) Audio decoder and method for providing decoded audio information using error concealment
RU2408089C2 (en) Decoding predictively coded data using buffer adaptation
WO2014117458A1 (en) Prediction method and coding/decoding device for high frequency band signal
JP2010520512A (en) Method and apparatus for performing steady background noise smoothing
CN106133832A (en) Apparatus and method for switching decoding techniques at a device
HK40070387B (en) Method for encoding and decoding high-frequency audio signal, and related apparatus
HK40070387A (en) Method for encoding and decoding high-frequency audio signal, and related apparatus
US12057130B2 (en) Audio signal encoding method and apparatus, and audio signal decoding method and apparatus
HK40086102A (en) Speech bandwidth expansion method and related apparatus
CN116110424A (en) Voice bandwidth expansion method and related device
HK1199543B (en) Audio data processing method and apparatus
HK1199540B (en) Forecasting method for high-frequency band signal, encoding device and decoding device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23787457

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM1205A DATED 19.02.2025)

122 Ep: pct application non-entry in european phase

Ref document number: 23787457

Country of ref document: EP

Kind code of ref document: A1