Disclosure of Invention
In view of the above, the present application provides a method and a system for identifying an audio scene, driving a motor, and an electronic device, so as to solve the problem that the existing racing game products have limitations in providing a user perception signal.
The first aspect of the present application provides a method for identifying an audio scene, including:
Acquiring audio data to be processed;
Dividing the audio data to be processed into a plurality of frame audio units which are continuous in time sequence;
Filtering each frame of audio unit according to the wave band characteristics corresponding to the audio scene to obtain target audio;
Acquiring frame number count and energy mean value of each frame of audio unit in the target audio, wherein the frame number count is used for representing the characteristics of a specific scene;
and comparing the frame number count or the energy mean value with characteristic thresholds corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit.
In one embodiment, the target audio comprises a first target audio and a second target audio, wherein the frame count of each frame of audio unit in the first target audio is a first frame count, the energy mean value is a first mean value, the frame count of each frame of audio unit in the second target audio is a second frame count, and the characteristic threshold comprises a first trigger threshold and a minimum frame count;
comparing the frame number count or the energy mean value with characteristic thresholds corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit comprises:
setting a first trigger threshold value of each frame according to the value characteristic of the first frame number count of each frame in the first target audio, and judging that an audio unit with a first average value larger than the first trigger threshold value is generated in a first audio scene;
And determining that the audio unit with the second frame number count being greater than or equal to the minimum frame number count is generated in a second audio scene, wherein the second audio scene is a scene with the speed of the control target changed in a second direction.
In one embodiment, the first trigger threshold includes an incremental primary trigger threshold, a mid-level trigger threshold, and a high-level trigger threshold;
The setting the first trigger threshold of each frame according to the value feature of the first frame count of each frame in the first target audio includes:
If GAIN_CNT (n) < a_GAIN_CNT_STEP, setting the first trigger threshold as a primary trigger threshold, wherein GAIN_CNT (n) represents a first frame number count of a current frame, GAIN_CNT_STEP represents an interval threshold, a is a positive number, and symbols represent multiplication, and the interval threshold is used for describing intervals between all levels of thresholds in the first trigger threshold;
If a is equal to or less than gain_cnt (n) <2a is equal to or less than gain_cnt_step, setting the first trigger threshold as a mid-level trigger threshold;
if 2a×gain_cnt_step is equal to or less than gain_cnt (n) <3a×gain_cnt_step, the first trigger threshold is set to an advanced trigger threshold.
In one embodiment, before comparing the frame count or the energy average value with the feature threshold corresponding to different audio scenes to determine the audio scene corresponding to each frame of audio unit, the method further includes:
Determining a first frame count of the current frame according to the first frame count of the previous frame, the first average value of the current frame and the interval threshold value, wherein the previous frame is a frame before the current frame;
and/or the number of the groups of groups,
And determining a second frame count of the current frame according to the second frame count of the previous frame, a second average value of the current frame and a second trigger threshold, wherein the second average value is an energy average value of a corresponding audio unit in the second target audio.
Specifically, the determining the first frame count of the current frame according to the first frame count of the previous frame, the first average value of the current frame and the interval threshold value includes:
If gain_cnt (n-1) < a_cnt_step, when ave_l (n) > b 2, update gain_cnt (n) with a first update, when ave_l (n) < b 1, and gain_cnt (n-1) >0, update gain_cnt (n) with a second update, wherein gain_cnt (n-1) represents a first frame count of a previous frame, gain_cnt (n) represents a first frame count of a current frame, gain_cnt_step represents an interval threshold, a is a positive number, symbol represents multiplication, ave_l (n) represents a first average value of a current frame, b 1 represents a first average value evaluation parameter, b 2 represents a second average value evaluation parameter, b 3 represents a third average value evaluation parameter, b 4 represents a fourth average value evaluation parameter, the first update is used to increase the first frame count, and the second update is used to decrease the first frame count;
If a is equal to or less than gain_cnt (n-1) <2a is gain_cnt_step, the gain_cnt (n) is updated with the first update when ave_l (n) > b 3, and the gain_cnt (n) is updated with the second update when ave_l (n) < b 2;
If 2a_gain_cnt_step is equal to or less than gain_cnt (n-1) < 3a_gain_cnt_step, the gain_cnt (n) is updated with the first update formula when ave_l (n) > b 4, the gain_cnt (n) is updated with the second update formula when ave_l (n) < b 3, and if gain_cnt (n) is equal to 3a_gain_cnt_step, the gain_cnt (n) is set to gain_cnt (n) -c 1, wherein c 1 represents the first step value.
Specifically, the first update is GAIN_CNT (n) =GAIN_CNT (n-1) +c 2, and the second update is GAIN_CNT (n) =GAIN_CNT (n-1) -c 3, wherein c 2 represents a second step value and c 3 represents a third step value.
Specifically, the determining the second frame count of the current frame according to the second frame count of the previous frame, the second average value of the current frame and the second trigger threshold includes:
If AVE_R (n) > BP_ATT, updating BP_CNT (n) by adopting a third updating type, and setting BP_CNT (n) as a maximum frame count value when BP_CNT (n) is larger than the maximum frame count value, wherein AVE_R (n) represents a second average value of the current frame, BP_ATT represents a second triggering threshold, BP_CNT (n) represents a second frame count of the current frame, and the third updating type is used for increasing the second frame count;
And if AVE_R (n) is less than or equal to BP_ATT, when BP_CNT (n-1) is positive, updating BP_CNT (n) by adopting a fourth updating type, wherein BP_CNT (n-1) represents a second frame number count of a previous frame, and the fourth updating type is used for reducing the second frame number count.
Specifically, the third update is BP_CNT (n) =BP_CNT (n-1) +c 4, and the fourth update is BP_CNT (n) =BP_CNT (n-1) -c 5, wherein c 4 represents a fourth step value and c 5 represents a fifth step value.
In one embodiment, the filtering the audio units of each frame according to the band characteristics corresponding to the audio scene, and obtaining the target audio includes:
Acquiring first channel data and second channel data of each frame of audio unit;
And carrying out band-pass filtering on the second channel data to obtain second target audio.
A second aspect of the present application provides a motor driving method, comprising:
According to any one of the above audio scene recognition methods, recognizing an audio scene corresponding to the currently played audio unit;
acquiring a corresponding vibration rule according to the audio scene;
and driving the motor to vibrate according to the vibration rule, so as to realize the vibration effect corresponding to the current played audio unit.
In one embodiment, the vibration rules include a first vibration rule and a second vibration rule;
The obtaining the corresponding vibration rule according to the audio scene comprises determining a first vibration rule according to a first average value and a first trigger threshold value of the current frame if the audio unit of the current frame is generated in the first audio scene, and determining a second vibration rule according to a second frame number count of the current frame if the audio unit of the current frame is generated in the second audio scene.
The method comprises the steps of setting the amplitude of a current frame as a second amplitude value and setting a maximum vibration sense flag bit as a first mark if AVE_L (n) > MAX_THR, controlling the amplitude of the current frame according to a vibration sense climbing rule when the maximum vibration sense flag bit is the second mark and GAIN (n-1) < GAIN_MAX, and setting the amplitude of the current frame as a motor maximum amplitude when the maximum vibration sense flag bit is the first mark and GAIN (n-1) > GAIN_MAX, wherein AVE_L (n) represents the first average value of the current frame, MAX_THR represents the maximum triggering threshold, MONG_THR represents the first triggering threshold when the maximum vibration sense flag bit is the second mark and GAIN (n-1) < GAIN_MAX, and setting the amplitude of the current frame as the motor maximum amplitude when the maximum vibration sense flag bit is the first mark and GAIN (n-1) > GAIN_MAX, and recording the vibration sense climbing rule according to the vibration sense climbing rule;
and/or the number of the groups of groups,
The determining the second vibration rule according to the second frame number count of the current frame comprises setting the amplitude of the current frame to be a first amplitude value if BP_CNT (n) is equal to or greater than m, wherein BP_CNT (n) represents the second frame number count of the current frame, and m represents a count threshold of the second audio scene.
Specifically, the first vibration rule further includes:
If AVE_L (n) > MOVING_THR, the maximum vibration sense flag bit is the first flag, and GAIN (n-1) is larger than the third amplitude value, the amplitude of the current frame is set to the first amplitude value.
Specifically, the controlling the amplitude of the current frame according to the vibration sense climbing rule includes:
and acquiring the climbing frame number count, searching the amplitude values sequenced into the climbing frame number count in a climbing control matrix, and determining the amplitude of the current frame according to the sum of the searched amplitude values and GAIN (n-1).
Specifically, after controlling the amplitude increment of the current frame according to the vibration sense climbing rule, the method further comprises:
and when GAIN (n) > GAIN_MAX, setting the amplitude of the current frame as the maximum amplitude of the motor, setting the maximum vibration sense flag bit as the first flag, and executing an addition operation on the climbing frame number count, wherein GAIN (n) represents the amplitude of the current frame.
Specifically, the above motor driving method further includes:
and if the climbing frame number count is greater than the climbing frequency threshold, setting the climbing frame number count as the climbing frequency threshold.
A third aspect of the present application provides an identification system for an audio scene, comprising:
The first acquisition module is used for acquiring the audio data to be processed;
the segmentation module is used for segmenting the audio data to be processed into a plurality of frame audio units which are continuous in time sequence;
The filtering module is used for carrying out filtering processing on each frame of audio unit according to the wave band characteristics corresponding to the audio scene to obtain target audio;
The second acquisition module is used for acquiring the frame number count and the energy mean value of each frame of audio unit in the target audio, wherein the frame number count is used for representing the characteristics of a specific scene;
And the judging module is used for comparing the frame number count or the energy mean value with characteristic thresholds corresponding to different audio scenes and judging the audio scene corresponding to each frame of audio unit.
A fourth aspect of the present application provides a motor driving system, comprising:
The identification module is used for identifying the audio scene corresponding to the current played audio unit according to the identification system of any audio scene;
the third acquisition module is used for acquiring corresponding vibration rules according to the audio scene;
and the driving module is used for driving the motor to vibrate according to the vibration rule so as to realize the vibration effect corresponding to the current played audio unit.
A fifth aspect of the present application provides an electronic device, including a processor and a storage medium, where the storage medium has program code stored thereon, and where the processor is configured to invoke the program code stored on the storage medium to perform any of the above-mentioned methods for identifying an audio scene.
In one embodiment, the electronic device further comprises a motor, and the processor is further configured to execute any one of the motor driving methods.
According to the method and system for identifying the audio scene, the electronic equipment, the audio data to be processed are acquired, the audio data to be processed are divided into the multi-frame audio units which are continuous in time sequence, the audio units corresponding to the frames can be accurately identified by taking the frames as units in the identification process of the audio scene, the audio units of the frames are subjected to filtering processing according to the band characteristics corresponding to the audio scenes, the target audio is acquired, the target audio comprises data which effectively represent the state characteristics of the control target, the accuracy of the audio scene identification according to the target audio can be improved, the frame count and the energy mean value of the audio units of the target audio are acquired, the frame count or the energy mean value of the audio units of the target audio are compared with the characteristic threshold values corresponding to different audio scenes, and the audio scenes corresponding to the audio units of the frames are judged, so that other user perception signals corresponding to the vibration signals corresponding to specific audio scenes are set, the corresponding game products can provide more comprehensive user perception signals, the user feeling in the game process can be improved, and the purpose of improving user experience is achieved.
Detailed Description
The following description of the embodiments of the present application will be made in detail and with reference to the accompanying drawings, wherein it is apparent that the embodiments described are only some, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application. The various embodiments described below and their technical features can be combined with each other without conflict.
A first aspect of the present application provides a method for identifying an audio scene, as shown in fig. 1, where the method for identifying an audio scene includes:
s100, obtaining the audio data to be processed.
The audio data to be processed can be derived from an audio signal of a racing game client running on an intelligent terminal such as a mobile phone or from an audio signal of a racing game provided by a game terminal such as a racing game machine. The audio signal may be an audio data stream outputted after decoding of audio files of various formats.
S200, dividing the audio data to be processed into a plurality of time-sequence continuous audio units.
Specifically, the steps can adopt a set sampling frequency and a set sampling bit number to sample the audio signal of the racing game to obtain the audio data to be processed, then frame the audio data stream by taking N audio sampling points as step length, and 0-supplementing the audio units with less than N audio sampling points to obtain the multi-frame audio units which are continuous in time sequence, so that the audio data to be processed comprises the multi-frame audio units, and each frame of audio unit comprises N audio sampling points.
Optionally, the set sampling frequency and the set sampling bit number may be set according to the recognition accuracy of the subsequent game scene, for example, the set sampling frequency is set to 48kHz, the set sampling bit number is set to 16bit value, and at this time, if the game client terminal is a QQ galloping running on the mobile phone, the audio signal of the QQ galloping can be obtained from the corresponding mobile phone system terminal, and the original audio signal is sampled at the sampling rate of 48kHz and the sampling depth of 16bit, so as to obtain the audio data to be processed. Where the value of N may be an integer power of 2 such as 1024.
S300, filtering processing is carried out on each frame of audio unit according to the band characteristics corresponding to the audio scene, and target audio is obtained.
The steps can set the filtering mode and the filtering parameters according to the band characteristics of the audio in the audio scene to be identified so as to reduce noise data in the target audio, so that the obtained target audio comprises the state change characteristics of the control target in the corresponding scene as far as possible, and the identification accuracy of the subsequent audio scene is improved. If some noise characteristics of a certain audio scene are mainly below a certain frequency value, but effective audio characteristics of the audio scene are hardly included in the frequency band, high-pass filtering can be performed on each frame of audio units, so that the obtained target audio can include the noise data as little as possible.
S400, obtaining the frame number count and the energy mean value of each frame of audio unit in the target audio, wherein the frame number count is used for representing the characteristics of a specific scene.
The specific scene may be a scene that needs to provide other sensing signals in addition to the existing sensing signals of hearing and/or vision, such as a collision scene that needs to provide a sense of collision, an acceleration scene that needs to provide a sense of acceleration, and so on. The features possessed by the specific scene may include a background music feature of the specific scene and a state change feature of the operation target in the specific scene. The background music feature may include a music type and/or a music intensity, and the state change feature may include a change feature of a state of the manipulation object in the corresponding game scene, such as a speed change feature and/or a position change feature, and the like. In racing games, the state change of the manipulation object in most scenes often appears as a speed change, and thus in these scenes, the state change feature may be a speed change feature, and the speed change feature may include a speed change direction, a speed change size and the like, which characterize the speed change of the manipulation object in a specific scene. The target audio comprises continuous multi-frame audio units, each frame audio unit comprises N audio sampling points, each sampling point has a corresponding energy value, and the energy value can be determined according to the amplitude of the corresponding sampling point. The absolute values of the energy values of the individual sampling points may be summed, and the summed value divided by the number N of audio sampling points, to obtain an average energy value, which is the energy mean of the audio unit of the frame.
The frame number count is a parameter representing the background music characteristic and the control target state change characteristic in the target audio. Different frame number count determining rules can be set for different game scenes, and in most game scenes, the frame number count of a certain frame of audio unit can be determined according to various state parameters of a control target in the frame of audio unit and/or a plurality of previous frames of audio units. Specifically, in some game scenarios, an initial value of the frame number count may be preset, and the initial value is used as a previous frame number count of the first frame audio unit, and then state parameters such as an energy average value of each frame audio unit are identified, and the frame number count of each frame audio unit is determined by combining features of the previous frame or previous frames of audio units. As in one example, for each frame of audio units generated by some scenes, the frame count of the previous frame of audio units may be increased, decreased, or maintained based on the energy average of each frame of audio units to determine the frame count of the current frame. For example, when the energy average value of a certain frame is larger than a first energy threshold value, the frame number count of the certain frame is increased by one or more count units based on the frame number count of the previous frame, when the energy average value of the certain frame is smaller than a second energy threshold value, the frame number count of the certain frame is decreased by one or more count units based on the frame number count of the previous frame, and when the energy average value of the certain frame is larger than or equal to the second energy threshold value and smaller than or equal to the first energy threshold value, the frame number count of the certain frame is set as the frame number count of the previous frame, wherein the first energy threshold value is larger than the second energy threshold value. As another example, for each frame of audio unit generated in other scenes, the frame count of the previous frame or frames of audio units may be first identified, and after the frame count of the previous frame or frames of audio units satisfies a certain condition, the frame count of the current frame is determined in a manner of increasing, decreasing or maintaining the frame count of the previous frame according to the energy average value. Optionally, if the frame count of the current frame is determined by reducing the count unit based on the frame count of the previous frame, a minimum value (e.g. 0 equivalent) of the frame count in the corresponding game scene may be set, and when the frame count of the previous frame reaches the minimum value or is smaller than the minimum value after the corresponding count unit is reduced, the frame count of the current frame is determined by maintaining the frame count of the previous frame.
S500, comparing the frame number count or the energy mean value with characteristic thresholds corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit.
The characteristic threshold may be set according to the audio variation characteristic in the corresponding audio scene, in some scenes, it may be represented as one or more fixed values, and in other scenes, it may also be adjusted in real time according to the state parameters of the frame count and/or the energy average value of the relevant audio unit, so as to accurately determine the audio scene or the game scene in which the corresponding audio unit is located.
According to the method for identifying the audio scene, the audio data to be processed is acquired, the audio data to be processed is divided into the multi-frame audio units which are continuous in time sequence, the audio units corresponding to the frames are identified in a frame mode in the identification process of the audio scene, the frames can be accurately identified, the audio units corresponding to the frames are subjected to filtering processing according to the band characteristics corresponding to the audio scenes, the target audio is acquired, the target audio comprises data which effectively characterize and control the state characteristics of the target audio, the accuracy of the audio scene identification according to the target audio can be improved, the frame count and the energy average value of the audio units in the target audio are acquired, the frame count or the energy average value is compared with the characteristic threshold values corresponding to different audio scenes, and the audio scenes corresponding to the audio units are judged, so that corresponding vibration signals, shaking signals and other user perception signals can be set for specific audio scenes, corresponding game products can provide more comprehensive user perception signals, the sense of participation of users in the game process can be improved, and therefore the purpose of improving user experience is achieved.
In one embodiment, the target audio comprises a first target audio and a second target audio, wherein the frame count of each frame audio unit in the first target audio is a first frame count, the energy average value is a first average value, the frame count of each frame audio unit in the second target audio is a second frame count, and the characteristic threshold comprises a first trigger threshold and a minimum frame count value;
comparing the frame number count or the energy mean value with characteristic thresholds corresponding to different audio scenes, and judging the audio scene corresponding to each frame of audio unit comprises:
setting a first trigger threshold value of each frame according to the value characteristic of the first frame number count of each frame in the first target audio, and judging that an audio unit with a first average value larger than the first trigger threshold value is generated in a first audio scene;
And determining that the audio unit with the second frame number count being greater than or equal to the minimum frame number count is generated in a second audio scene, wherein the second audio scene is a scene with the speed of the control target changed in a second direction.
In racing games, the specific scenes that need to provide other perceived signals in addition to visual and audible are often the scenes where the speed of the steering targets of the racing car changes, such as small nitrogen acceleration, large nitrogen acceleration, collisions, passing acceleration zones and drift. The speed change of the specific scenes is mainly represented in two directions, namely, a direction parallel to the running direction of the control target, such as the speed change direction in the scenes of small nitrogen acceleration, large nitrogen acceleration, collision or passing through an acceleration zone, the direction is set as a first direction, the scenes are called first audio scenes, the target audio corresponding to the first audio scenes is called first target audio, and the direction at an angle with the running direction of the control target (such as a direction perpendicular to the running direction of the control target), such as the speed change direction in the scenes of drifting or scratch, the direction is set as a second direction, the scenes are called second audio scenes, and the target audio corresponding to the second audio scenes is called second target audio. The first target audio and the second target audio can be obtained by filtering the audio data to be processed according to the band characteristics of the corresponding audio scene respectively, and are consistent with the audio data to be processed, wherein each of the first target audio and the second target audio comprises a plurality of frame audio units which are continuous in time sequence, and each frame audio unit comprises N audio sampling points. And for the first average value of each frame of audio unit, the first frame count of the previous frame or the first frame counts of the previous frames is identified, and according to the identification result, the first frame count of each frame of audio unit can be determined by increasing, decreasing or maintaining the previous frame count mode. Summing the absolute values of the energy values of all the sampling points in a certain frame of audio unit of the second target audio, dividing the summed value by the number N of the audio sampling points to obtain an average energy value, wherein the average energy value is the energy average value of the frame of audio unit, namely a second average value; for the second average value of each frame of audio units, the second frame number count of the previous frame or the previous frames is identified, and the second frame number count of each frame of audio units can be determined by increasing, decreasing or maintaining the previous frame number count according to the identification result.
If a certain frame of audio unit is generated in the first audio scene, the corresponding game scene includes a first game scene with small nitrogen acceleration, large nitrogen acceleration, collision, acceleration zone and the like, and changes in the first direction speed, a specific user perception signal can be set according to parameters such as a first frame number count, a first average value, a first frame number count of an adjacent frame of audio unit and the like in the frame of audio unit and characteristics of corresponding terminal equipment, for example, for the intelligent mobile device, a corresponding vibration signal can be set according to parameters such as the first frame number count, the first average value and the first frame number count of the adjacent frame of audio unit in the frame of audio unit, so that the intelligent mobile device can vibrate correspondingly when playing the frame of audio unit, for example, for a game machine with a racing model, a model shaking signal representing actions such as acceleration running and/or collision of the racing model can be set according to the parameters such as the first frame number count, the first average value, the first frame number count of the adjacent frame of audio unit and the like in the frame of audio unit, and the corresponding shaking signal can be generated when the racing model is played by the game machine, thus more comprehensive game perception can be provided for users, and users can feel that the game states of the users can change in multiple aspects from the playing states in multiple aspects to the game. If a certain frame of audio unit is generated in the second audio scene, the corresponding game scene includes a second game scene with a change in speed in a second direction such as drift, other user perception signals can be set according to the second frame number count in the frame of audio unit and the characteristics of corresponding terminal equipment, for example, for the intelligent mobile equipment, corresponding vibration signals can be set according to parameters such as the second frame number count in the frame of audio unit, so that the intelligent mobile equipment can vibrate correspondingly when playing the frame of audio unit data of a game, for example, for a game machine with a racing model, model shaking signals representing actions such as drift of the racing model can be set according to the second frame number count in the frame of audio unit, so that the racing model can shake correspondingly when playing the frame of audio unit, and more comprehensive game perception can be further provided for users, so that the users can feel the state change of a control target in the second direction from more aspects in the game process.
According to the method, the first trigger threshold value of each frame is set according to the first frame number count and the value characteristic of the first frame number count, so that the value of the first trigger threshold value can be determined according to the background music characteristic and the control target speed change characteristic of the corresponding audio unit representation, and the accuracy of identifying the first audio scene according to the first trigger threshold value can be improved. In the second audio scene, the minimum frame number count value may be set according to a speed change characteristic of a specific action of the control target in the racing game in the second direction, for example, the minimum frame number count value may be set to 6 according to the speed change characteristic of the action of drifting in the second direction. If the second frame count of a certain frame of audio unit is greater than or equal to the minimum frame count, it indicates that the frame of audio unit is generated in the second audio scene.
Specifically, the first trigger threshold includes an incremental primary trigger threshold, a mid-level trigger threshold, and a high-level trigger threshold;
The setting the first trigger threshold of each frame according to the value feature of the first frame count of each frame in the first target audio includes:
If GAIN_CNT (n) < a_GAIN_CNT_STEP, setting the first trigger threshold as a primary trigger threshold, wherein GAIN_CNT (n) represents a first frame number count of a current frame, GAIN_CNT_STEP represents an interval threshold, a is a positive number, and symbols represent multiplication, and the interval threshold is used for describing intervals between all levels of thresholds in the first trigger threshold;
If a is equal to or less than gain_cnt (n) <2a is equal to or less than gain_cnt_step, setting the first trigger threshold as a mid-level trigger threshold;
if 2a×gain_cnt_step is equal to or less than gain_cnt (n) <3a×gain_cnt_step, the first trigger threshold is set to an advanced trigger threshold.
The primary trigger threshold, the intermediate trigger threshold and the advanced trigger threshold can be set according to background music characteristics adopted by a specific game scene and action characteristics of the control target, which change in speed in a first direction. Specifically, the primary trigger threshold, the intermediate trigger threshold, and the advanced trigger threshold present increasing trends, and the advanced trigger threshold is smaller than the maximum trigger threshold, e.g., primary trigger threshold takes 3500, intermediate trigger threshold takes 4000, advanced trigger threshold takes 4500, and maximum trigger threshold takes 7000. The interval threshold gain_cnt_step is an interval of setting the frame count for each level of threshold (e.g., primary trigger threshold, intermediate trigger threshold, and advanced trigger threshold) in the first trigger threshold, and may be determined according to an update rule of the first frame count, for example, may be set to a positive value such as 20. The value of a can be determined according to the value characteristic of the interval threshold value gain_cnt_step, for example, in a certain example, when the interval threshold value gain_cnt_step takes 20, a can take 1, if the primary trigger threshold value takes 3500, the middle trigger threshold value takes 4000, the high trigger threshold value takes 4500, and the maximum trigger threshold value takes 7000, then there are:
If gain_cnt (n) <20, moving_thr=3500, wherein moving_thr represents a first trigger threshold;
If 20.ltoreq.GAIN_CNT (n) <40, then MOVING_THR=4000;
if 40.ltoreq.GAIN_CNT (n) <60, then MOVING_THR=4500.
In one embodiment, before comparing the frame count or the energy average value with the feature threshold corresponding to different audio scenes to determine the audio scene corresponding to each frame of audio unit, the method further includes:
Determining a first frame count of the current frame according to the first frame count of the previous frame, the first average value of the current frame and the interval threshold value, wherein the previous frame is a frame before the current frame;
and/or the number of the groups of groups,
And determining a second frame count of the current frame according to the second frame count of the previous frame, a second average value of the current frame and a second trigger threshold, wherein the second average value is an energy average value of a corresponding audio unit in the second target audio.
The embodiment determines the first frame count of the current frame according to the first frame count of the previous frame, the first average value of the current frame and the interval threshold value, can enable the determined first frame count to more accurately represent the background music characteristic and the speed change characteristic of the control target in the first direction in the corresponding audio unit, and determines the second frame count of the current frame according to the second frame count of the previous frame, the second average value of the current frame and the second trigger threshold value, so that the determined second frame count more accurately represents the speed change characteristic of the control target in the second direction in the corresponding audio unit.
In one example, the determining the first frame count of the current frame based on the first frame count of the previous frame, the first average value of the current frame, and the interval threshold value includes:
If gain_cnt (n-1) < a_cnt_step, when ave_l (n) > b 2, update gain_cnt (n) with a first update, when ave_l (n) < b 1, and gain_cnt (n-1) >0, update gain_cnt (n) with a second update, wherein gain_cnt (n-1) represents a first frame count of a previous frame, gain_cnt (n) represents a first frame count of a current frame, gain_cnt_step represents an interval threshold, a is a positive number, symbol represents multiplication, ave_l (n) represents a first average value of a current frame, b 1 represents a first average value evaluation parameter, b 2 represents a second average value evaluation parameter, b 3 represents a third average value evaluation parameter, b 4 represents a fourth average value evaluation parameter, the first update is used to increase the first frame count, and the second update is used to decrease the first frame count;
If a is equal to or less than gain_cnt (n-1) <2a is gain_cnt_step, the gain_cnt (n) is updated with the first update when ave_l (n) > b 3, and the gain_cnt (n) is updated with the second update when ave_l (n) < b 2;
If 2a_gain_cnt_step is equal to or less than gain_cnt (n-1) < 3a_gain_cnt_step, the gain_cnt (n) is updated with the first update formula when ave_l (n) > b 4, the gain_cnt (n) is updated with the second update formula when ave_l (n) < b 3, and if gain_cnt (n) is equal to 3a_gain_cnt_step, the gain_cnt (n) is set to gain_cnt (n) -c 1, wherein c 1 represents the first step value.
Specifically, the first update is GAIN_CNT (n) =GAIN_CNT (n-1) +c 2, and the second update is GAIN_CNT (n) =GAIN_CNT (n-1) -c 3, wherein c 2 represents a second step value and c 3 represents a third step value.
The first step value c 1, the second step value c 2, and the third step value c 3 may take a conventional count unit value (e.g., 1), or take an integer multiple of the count unit value (e.g., 2, etc.), respectively. The first average value evaluation parameter b 1, the second average value evaluation parameter b 2, the third average value evaluation parameter b 3 and the fourth average value evaluation parameter b 4 may be set according to the background music type adopted by the corresponding game and the audio energy characteristics corresponding to various actions of the control target when the speed of the control target in the first direction changes. Specifically, the first mean value evaluation parameter b 1, the second mean value evaluation parameter b 2, the third mean value evaluation parameter b 3 and the fourth mean value evaluation parameter b 4 are incremented, for example, the first mean value evaluation parameter b 1 is 2800, the second mean value evaluation parameter b 2 is 3200, the third mean value evaluation parameter b 3 is 4000, and the fourth mean value evaluation parameter b 4 is 5000, at this time, if c 1=c2=c3 =1, a=1, there are:
if gain_cnt (n-1) < gain_cnt_step, gain_cnt (n) =gain_cnt (n-1) +1 at ave_l (n) >3200, gain_cnt (n) =gain_cnt (n-1) -1 at ave_l (n) <2800, and gain_cnt (n-1) > 0;
If gain_cnt_step is less than or equal to gain_cnt (n-1) <2 x gain_cnt_step, gain_cnt (n) =gain_cnt (n-1) +1 at ave_l (n) >4000, gain_cnt (n) =gain_cnt (n-1) -1 at ave_l (n) < 3200;
If 2 x gain_cnt_step is equal to or less than gain_cnt (n-1) <3 x gain_cnt_step, gain_cnt (n) =gain_cnt (n-1) +1 at ave_l (n) >5000, gain_cnt (n) =gain_cnt (n-1) -1 at ave_l (n) < 4000;
If gain_cnt (n) = 3×gain_cnt_step, then gain_cnt (n) is set to gain_cnt (n) -1, i.e., gain_cnt (n) -1.
In one example, the determining the second frame count of the current frame based on the second frame count of the previous frame, the second average value of the current frame, and the second trigger threshold includes:
If AVE_R (n) > BP_ATT, updating BP_CNT (n) by adopting a third updating type, and setting BP_CNT (n) as a maximum frame count value when BP_CNT (n) is larger than the maximum frame count value, wherein AVE_R (n) represents a second average value of the current frame, BP_ATT represents a second triggering threshold, BP_CNT (n) represents a second frame count of the current frame, and the third updating type is used for increasing the second frame count;
And if AVE_R (n) is less than or equal to BP_ATT, when BP_CNT (n-1) is positive, updating BP_CNT (n) by adopting a fourth updating type, wherein BP_CNT (n-1) represents a second frame number count of a previous frame, and the fourth updating type is used for reducing the second frame number count.
Specifically, the third update is BP_CNT (n) =BP_CNT (n-1) +c 4, and the fourth update is BP_CNT (n) =BP_CNT (n-1) -c 5, wherein c 4 represents a fourth step value and c 5 represents a fifth step value.
The fourth step value c 4 and the fifth step value c 5 may take a conventional count unit value (e.g., 1) or an integer multiple of the count unit value (e.g., 2). The second trigger threshold bp_att may be set according to a motion characteristic of the steering target in the racing game when the speed of the steering target in the second direction changes, for example, the second trigger threshold may be set to 1400 for the motion of drifting. The maximum frame count value is the maximum value of the frame counts in the second audio scene, and may be set according to the relevant characteristics of a specific action in the second game scene, for example, the maximum frame count value is set to 10 for the action of drift.
In one embodiment, the filtering the audio units of each frame according to the band characteristics corresponding to the audio scene to obtain the target audio includes:
Acquiring first channel data and second channel data of each frame of audio unit;
And carrying out band-pass filtering on the second channel data to obtain second target audio.
The first channel data and the second channel data may generally be audio data corresponding to two channels in an audio playing module of a corresponding terminal device (such as an intelligent terminal or a game machine), for example, the first channel data is left channel data, and the second channel data is right channel data. In some examples, the first channel data may have a first partial characteristic of the audio data to be processed, e.g., the left channel may be a characteristic after compressing the low audio region signal, the second channel data may have a second partial characteristic of the audio data to be processed, e.g., the right channel may be a characteristic after compressing the medium and high audio region signal, and the first partial characteristic may not be exactly identical to the second partial characteristic. In another example, the first channel data and the second channel data may both be audio data to be processed, such as when the audio data to be processed is mono data, the first channel data and the second channel data are both copied to the audio data to be processed.
Specifically, the audio frequency points of the first audio scene are mainly in the frequency range of 100Hz-2000Hz, and the like, the first channel data are subjected to low-pass filtering, so that interference audio such as human voice in a game can be filtered, noise data in the first target audio can be reduced, and the characteristics such as background music and/or control target speed of a specific scene can be effectively represented. When the control target moves in the second direction such as drifting in the game process, the generated audio is mainly concentrated in the frequency range such as 1100Hz-1300Hz, band-pass filtering is carried out on the second sound data, and the audio which represents the movement of the control target in the second direction in the second sound data can be extracted, so that the second target audio can represent the state change characteristics such as the speed of the control target in the second direction.
Specifically, in this embodiment, the cut-off frequency of the low-pass filtering may be set according to the frequency ranges where the effective audio and the interference audio in the first audio scene are respectively located, and is generally set to a parameter that can pass through the effective audio and filter the interference audio as far as possible, for example, if the interference audio, i.e. the voice of the first audio scene, is mainly in the high frequency portion of the range of 100Hz-2000Hz, the cut-off frequency of the low-pass filtering may be set to 225Hz, so as to filter the voice of the first audio scene as far as possible. The pass frequency band of the band-pass filtering can be set according to the frequency band where the audio frequency representing the action characteristic of the control target in the second direction in the specific game product is located, and often can be set as the frequency band where the audio frequency representing the action characteristic of the control target in the second direction is located, for example, the frequency band of 1100Hz-1300 Hz.
A second aspect of the present application provides a motor driving method, as shown with reference to fig. 2, comprising:
S700, according to the method for identifying audio scenes provided in any of the above embodiments, the audio scene corresponding to the currently played audio unit is identified.
S800, according to the audio scene, a corresponding vibration rule is obtained.
The above-described vibration rule records the vibration characteristics of the vibration of the drive motor. The vibration characteristics may include vibration amplitude (amplitude) and/or a trend of change in vibration. When the vibration rule corresponding to each frame of audio unit is set, the maximum vibration sense FLAG bit trig_flag may be set to mark the vibration sense degree of the corresponding vibration signal, the maximum vibration sense FLAG bit trig_flag may have two FLAG signs, namely a first FLAG and a second FLAG, the maximum vibration sense FLAG bit trig_flag represents that the corresponding vibration sense degree is high for the first FLAG, the maximum vibration sense FLAG bit trig_flag represents that the corresponding vibration sense degree is low for the second FLAG, and the initial value of the maximum vibration sense FLAG bit trig_flag may be set to represent that the vibration sense degree is low for the second FLAG. In one example, the first flag may be noted 1 and the second flag may be noted 0.
And S900, driving the motor to vibrate according to the vibration rule, and realizing the vibration effect corresponding to the current played audio unit.
According to the motor driving method, the corresponding vibration rule is obtained by identifying the audio scene corresponding to the current playing audio unit, and the motor is driven to vibrate according to the vibration rule, so that the vibration effect corresponding to the current playing audio unit is realized, the perception signal of racing game products is effectively enriched, and therefore, a user can more comprehensively perceive various state changes of the control target in the scenes in the game process, and the higher participation feeling is realized.
In one embodiment, the vibration rules comprise a first vibration rule and a second vibration rule, and the obtaining the corresponding vibration rule according to the audio scene comprises determining the first vibration rule according to a first average value and a first trigger threshold value of the current frame if the audio unit of the current frame is generated in the first audio scene, and determining the second vibration rule according to a second frame number count of the current frame if the audio unit of the current frame is generated in the second audio scene.
According to the method and the device for controlling the vibration of the racing car game, the first vibration rule of the first audio scene and the second vibration rule of the second audio scene can be respectively determined, and when each frame of audio unit is played, the motor is driven to vibrate according to the corresponding vibration rule, so that the corresponding racing car game product can provide vibration signals for the specific game scenes, namely the first audio scene and/or the second audio scene, and the vibration effect of the corresponding game product can be further improved.
The method comprises the steps of setting the amplitude of a current frame as a second amplitude value and setting a maximum vibration sense flag bit as a first mark if AVE_L (n) > MAX_THR, controlling the amplitude of the current frame according to the vibration sense climbing rule when the maximum vibration sense flag bit is the second mark and GAIN (n-1) < GAIN_MAX, setting the amplitude of the current frame as the motor maximum amplitude when the maximum vibration sense flag bit is the first mark and GAIN (n-1) > GAIN_MAX, setting the AVE_L (n) as the first average value of the current frame, setting the MAX_THR as the maximum triggering threshold value, setting the maximum vibration sense flag bit to mark the degree, setting the MOVING_THR as the first triggering threshold value when the maximum vibration sense flag bit is the second mark and GAIN (n-1) < GAIN_MAX, and recording the vibration sense climbing rule according to the vibration sense climbing rule, and recording the vibration amplitude of the current frame as the motor maximum amplitude when the maximum vibration sense flag bit is the first mark and the GAIN (n-1) > GAIN_MAX, and recording the vibration sense climbing rule in sequence.
And/or the number of the groups of groups,
The determining the second vibration rule according to the second frame number count of the current frame includes setting the amplitude of the current frame to be a first amplitude value if BP_CNT (n) is equal to or greater than m, wherein BP_CNT (n) represents the second frame number count of the current frame, m represents a count threshold of the second audio scene, and the count threshold can be set according to a corresponding manipulation target action characteristic in the second game scene, for example, the action for drifting can be set to be 6.
The number of the amplitude values recorded by the climbing control matrix can be set according to the corresponding vibration effect. In one example, the climbing control matrix records 8 amplitude values, which may be denoted as GAIN_STEP [8], and the specific value features may be in sequence:
Specifically, the first vibration rule further includes:
If AVE_L (n) > MOVING_THR, the maximum vibration sense flag bit is the first flag, and GAIN (n-1) is larger than the third amplitude value, the amplitude of the current frame is set to the first amplitude value.
The first amplitude value, the second amplitude value, and the third amplitude value may be set according to the vibration sensation required to be configured according to the corresponding motion characteristics, for example, the first amplitude value may be set asSetting the second amplitude value toSetting the third amplitude value toTo enhance the vibration effect.
Further, the controlling the amplitude of the current frame according to the vibration sense climbing rule includes:
and acquiring the climbing frame number count, searching the amplitude values sequenced into the climbing frame number count in a climbing control matrix, and determining the amplitude of the current frame according to the sum of the searched amplitude values and GAIN (n-1).
The embodiment sets the climbing frame number count, so that the amplitude values sequenced into the climbing frame number count can be searched in the climbing control matrix, the amplitude of the current frame is determined according to the searched amplitude values, and the order of the vibration sense climbing rules is ensured.
Optionally, after controlling the amplitude increment of the current frame according to the vibration sense climbing rule, the method further comprises:
and when GAIN (n) > GAIN_MAX, setting the amplitude of the current frame as the maximum amplitude of the motor, setting the maximum vibration sense flag bit as the first flag, and executing an addition operation on the climbing frame number count, wherein GAIN (n) represents the amplitude of the current frame.
Optionally, the motor driving method further includes:
and if the climbing frame number count is greater than the climbing frequency threshold, setting the climbing frame number count as the climbing frequency threshold.
The threshold value of the number of climbing times can be set according to the number of the amplitude values of the climbing control matrix, and is usually set to be a value slightly smaller than the number of the amplitude values, for example, when the climbing control matrix comprises 8 amplitude values, the threshold value of the number of climbing times can be set to be 5, so that each value of the number of vibrating climbing frames can be counted, and the corresponding amplitude value can be found in the corresponding climbing control matrix to perform vibration control.
In a third aspect, the present application provides an audio scene recognition system, as shown in fig. 3, including:
a first obtaining module 100, configured to obtain audio data to be processed;
a dividing module 200, configured to divide the audio data to be processed into a plurality of frame audio units that are continuous in time sequence;
The filtering module 300 is configured to perform filtering processing on each frame of audio unit according to the band characteristics corresponding to the audio scene, so as to obtain target audio;
a second obtaining module 400, configured to obtain a frame number count and an energy average value of each frame of audio unit in the target audio, where the frame number count is used to characterize a feature of a specific scene;
the judging module 500 is configured to compare the frame count or the energy average value with feature thresholds corresponding to different audio scenes, and judge an audio scene corresponding to each frame of audio unit.
For specific limitations of the recognition system for audio scenes, reference may be made to the above limitation of the recognition method for audio scenes, and no further description is given here. The various modules in the above-described audio scene recognition system may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The present application in a fourth aspect provides a motor driving system, as shown in fig. 4, comprising:
The identifying module 700 is configured to identify an audio scene corresponding to a currently played audio unit according to the identifying system of an audio scene provided in any one of the above embodiments;
a third obtaining module 800, configured to obtain a corresponding vibration rule according to the audio scene;
and the driving module 900 is used for driving the motor to vibrate according to the vibration rule so as to realize the vibration effect corresponding to the current played audio unit.
For specific limitations of the motor driving system, reference may be made to the above limitations of the motor driving method, and no further description is given here. The various modules in the motor drive system described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The application provides an electronic device in a fifth aspect, which comprises a processor and a storage medium, wherein the storage medium is stored with program codes, and the processor is used for calling the program codes stored in the storage medium to execute the method for identifying the audio scene according to any embodiment.
In one embodiment, the electronic device further comprises a motor, and the processor is further configured to execute the motor driving method provided in any one of the embodiments.
Specifically, referring to fig. 5, the electronic device further includes a motor driving chip, where when the electronic device plays each frame of audio unit, the processor may control the motor driving chip to drive the motor to vibrate according to a vibration rule corresponding to each frame of audio unit.
Further, the electronic device may further include components such as an audio amplifier and a speaker to play each frame of audio unit. In this case, the working process of the electronic device may refer to fig. 6, where first, the audio data of the racing game is obtained, after it is determined that the audio data to be processed is generated in the first audio scene and/or the second audio scene, the vibration rule corresponding to each frame of audio unit is obtained, each frame of audio unit is played through the audio power amplifier and the speaker, and meanwhile, the motor driving chip is controlled to drive the motor to vibrate according to the vibration rule corresponding to each frame of audio unit.
The electronic device can also be called a terminal device, and can provide vibration signals for the specific scenes of the first audio scene and/or the second audio scene of the racing game, so that user perception signals of the game products are effectively enriched, and a user can more comprehensively perceive various state changes of the control targets in the scenes in the game process, and user experience is effectively improved.
Although the application has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The present application includes all such modifications and alterations and is limited only by the scope of the following claims. In particular regard to the various functions performed by the above described components, the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure which performs the function in the herein illustrated exemplary implementations of the specification.
That is, the foregoing embodiments of the present application are merely examples, and are not intended to limit the scope of the present application, and all equivalent structures or equivalent processes using the descriptions of the present application and the accompanying drawings, such as the combination of technical features of the embodiments, or direct or indirect application in other related technical fields, are included in the scope of the present application.
In addition, the terms "first," "second," are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more features. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
The previous description is provided to enable any person skilled in the art to make or use the present application. In the above description, various details are set forth for purposes of explanation. It will be apparent to one of ordinary skill in the art that the present application may be practiced without these specific details. In other instances, well-known processes have not been described in detail in order to avoid unnecessarily obscuring the description of the present application. Thus, the present application is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.