[go: up one dir, main page]

WO2023000206A1 - Procédé, appareil et système de localisation de source sonore vocale - Google Patents

Procédé, appareil et système de localisation de source sonore vocale Download PDF

Info

Publication number
WO2023000206A1
WO2023000206A1 PCT/CN2021/107616 CN2021107616W WO2023000206A1 WO 2023000206 A1 WO2023000206 A1 WO 2023000206A1 CN 2021107616 W CN2021107616 W CN 2021107616W WO 2023000206 A1 WO2023000206 A1 WO 2023000206A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
sensing information
transfer relationship
acoustic transfer
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2021/107616
Other languages
English (en)
Chinese (zh)
Inventor
王浩
刘成明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2021/107616 priority Critical patent/WO2023000206A1/fr
Priority to CN202180007542.XA priority patent/CN116368398A/zh
Publication of WO2023000206A1 publication Critical patent/WO2023000206A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation

Definitions

  • the embodiments of the present application relate to the field of acoustics, in particular to a method, device and system for locating a voice source.
  • the interior acoustic experience has gradually become one of the important considerations for users to purchase vehicles.
  • multiple users in the cockpit of the car are seated in different positions.
  • the sound field can be adjusted according to the position of the speaking user, such as voice enhancement, noise suppression, voice separation and other functions.
  • Accurate voice source localization can improve the experience of human machine interaction (HMI) such as in-car calls.
  • HMI human machine interaction
  • the location information of the vocalizing user can be obtained through a sensor array as prior information for the implementation of functions such as speech enhancement, noise suppression, and speech separation. Therefore, how to improve the robustness of speech sound source localization is worth studying.
  • Embodiments of the present application provide a method, device and system for locating a voice source, so as to improve the performance of computing equipment or reduce the cost of computing equipment.
  • the embodiment of the present application provides a voice sound source localization method, including: acquiring the sound sensor information of the first voice, the sound sensor information is determined by a plurality of sound sensors; according to the sound sensor information and the acoustic A transfer relationship, determining the sound source position of the first voice from multiple areas in the space; wherein, the acoustic transfer relationship is used to indicate that when audio is played in one or more areas in the space, the multiple sound sensors collect A transfer relationship of acoustic sensor information for the one or more regions, the acoustic transfer relationship is predetermined based on non-free field conditions.
  • the audio may include one of white noise and pink noise.
  • the space above can be the cockpit space of the car, and the area can be the seat area in the cockpit of the car.
  • the driving area and the co-pilot area another example, the front row area and the rear row area; the areas can also be distinguished by seat numbers.
  • the acoustic transfer relationship can be in various forms, such as functions, formulas, tables, correspondences, etc.
  • the spaces mentioned above can also be different areas within the same room.
  • the modeling of the acoustic transfer relationship of the sound field in the space is more accurate.
  • the acoustic transfer relationship modeled by this method has stronger anti-interference ability in the noisy scene, which can improve the voice quality.
  • Accuracy and robustness of sound source localization For the same car model, only one measurement in different areas is required, instead of one measurement for each vehicle, and the measurement cost is low.
  • the determining the sound source position of the first voice from space according to the sound sensing information and the acoustic transfer relationship includes: determining the sound source position according to the sound sensing information and the acoustic transfer relationship Power sums of multiple areas in space; determining the area corresponding to the largest power sum among the multiple areas as the sound source position of the first voice.
  • the determining the sound source position of the first voice from space according to the sound sensing information and the acoustic transfer relationship includes: determining the sound source position according to the sound sensing information and the acoustic transfer relationship The power sum of multiple regions in the space; when the power sum of one or more regions is greater than a threshold, determine that the one or more regions are the sound source position of the first voice.
  • the acoustic transfer relationship is related to a ratio between the frequency domain information of the audio and the frequency domain information of the sound sensing information.
  • the power is related to a difference between the sound sensing information collected by the plurality of sound sensors.
  • the plurality of sound sensors is a distributed sound sensor array, and the number of the plurality of sound sensors is greater than or equal to two.
  • the multiple sound sensors are a centralized sound sensor array, and the number of the multiple sound sensors is greater than or equal to two.
  • the above method has lower requirements on the number of sound collection devices, can reduce the use of devices such as sound sensor arrays and audio transmission lines, and reduce hardware costs. At the same time, communication channels and computing resources are also reduced. the s
  • the embodiment of the present application provides a method for locating a voice sound source, in which audio is played in the first area and the second area in the space, and the method includes: acquiring sound sensor information in the space, the acoustic sensory information is determined by a plurality of acoustic sensors in the space;
  • the acoustic transfer relationship includes the acoustic transfer relationship of the first area and the acoustic transfer relationship of the second area, and the acoustic transfer relationship of the first area and the acoustic transfer relationship of the second area are used to determine The location of the sound source in the space.
  • the audio may include one of white noise and pink noise.
  • the space above can be the cockpit space of the car, and the area can be the seat area in the cockpit of the car.
  • the driving area and the co-pilot area another example, the front row area and the rear row area; the areas can also be distinguished by seat numbers.
  • the acoustic transfer relationship can be in various forms, such as functions, formulas, tables, correspondences, etc.
  • the spaces mentioned above can also be different areas within the same room.
  • the modeling of the acoustic transfer relationship of the sound field in the space is more accurate.
  • the acoustic transfer relationship modeled by this method has stronger anti-interference ability in the noisy scene, which can improve the voice quality.
  • Accuracy and robustness of sound source localization For the same car model, only one measurement in different areas is required, instead of one measurement for each vehicle, and the measurement cost is low.
  • the audio is acquired
  • the determining the acoustic transfer relationship of the space according to the sound sensing information includes:
  • the acoustic transfer relationship of the space is determined according to the frequency domain information of the sound sensing information and the frequency domain information of the audio.
  • the audio includes first audio and second audio
  • the sound sensing information includes first sound sensing information and second sound sensing information
  • the first sound sensing The information is the sound sensing information obtained when the first audio is played in the first area
  • the second sound sensing information is the sound sensing information obtained when the second audio is played in the second area
  • the frequency domain information of the sensory information and the frequency domain information of the audio determine the acoustic transfer relationship of the space, including:
  • the acoustic transmission relationship of the second area is determined according to the ratio of the second sound sensing information to the second audio frequency.
  • the sound sensing information includes first sound sensing information and second sound sensing information, and the first sound sensing information includes playing the audio in the first area.
  • the sound sensing information determined by the plurality of sound sensors the second sound sensing information includes the sound sensing information determined by the plurality of sound sensors when the audio is played in the second area;
  • Each sensor includes 1 sensor, and 1 is a positive integer greater than or equal to 2.
  • the determining the acoustic transfer relationship of the space according to the sound sensing information includes: according to the frequency domain information of the first sound sensing information determined by 1-1 sound sensors among the plurality of sound sensors, The difference between the frequency domain information of the first sound sensor information determined by the same sound sensor except the 1-1 sound sensors among the plurality of sound sensors determines the acoustic Transfer relationship; according to the difference between the frequency domain information of the second sound sensing information determined by the 1-1 sound sensors and the frequency domain information of the second sound sensing information determined by the same sound sensor, determine the The acoustic transfer relationship of the second region is determined as described above.
  • the plurality of sound sensors is a distributed sound sensor array, and the number of the plurality of sound sensors is greater than or equal to two.
  • the multiple sound sensors are a centralized sound sensor array, and the number of the multiple sound sensors is greater than or equal to two.
  • the above method has lower requirements on the number of sound collection devices, can reduce the use of devices such as sound sensor arrays and audio transmission lines, and reduce hardware costs. At the same time, communication channels and computing resources are also reduced.
  • the embodiment of the present application provides a voice sound source localization device, including a processing unit and a transceiver unit, the transceiver unit is used to obtain the sound sensor information of the first voice, and the sound sensor information is composed of multiple sound sensor determined;
  • the processing unit is configured to determine the position of the sound source of the first voice from multiple regions of the space according to the sound sensing information and the acoustic transfer relationship;
  • the acoustic transfer relationship is used to represent the transfer relationship between the sound sensing information collected by the plurality of sound sensors and the one or more regions when audio is played in one or more regions in the space,
  • the acoustic transfer relationship is predetermined based on non-free field conditions.
  • the embodiment of the present application provides a voice sound source localization device, including a processing unit and a transceiver unit, the transceiver unit is used to obtain the sound sensor information in the space, and the sound sensor information is determined by the multiple sound sensors in the space;
  • the processing unit is configured to determine the acoustic transfer relationship of the space according to the sound sensing information
  • the acoustic transfer relationship includes the acoustic transfer relationship of the first area and the acoustic transfer relationship of the second area, and the acoustic transfer relationship of the first area and the acoustic transfer relationship of the second area are used to determine The location of the sound source in the space.
  • the embodiment of the present application provides a speech sound source localization device, including a processor and a memory, and the memory stores program codes, and when the program codes are executed by the processor, the first to the second aspects are implemented.
  • a speech sound source localization device including a processor and a memory
  • the memory stores program codes, and when the program codes are executed by the processor, the first to the second aspects are implemented.
  • the embodiment of the present application provides a speech sound source localization device, including: a processor and an interface circuit; wherein, the processor is coupled with the memory through the interface circuit, and the processor is used to execute the program code in the memory , so as to implement the method described in any one of the first aspect to the second aspect or any possible implementation manner of any aspect.
  • the speech sound source localization devices provided in the third aspect to the sixth aspect can be used to implement the method described in any one of the first to the second aspects or any possible implementation manner of any aspect.
  • the voice sound source localization device when it is applied to a cockpit, it may be a vehicle-mounted device, a vehicle-mounted chip, a vehicle, a vehicle-mounted processor, and the like.
  • the voice and sound source locating device when the voice and sound source locating device is applied to a smart home, the voice and sound source locating device may be a device such as a smart speaker or a smart chip.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores program codes, and when the program codes are executed by a terminal or a processor in the terminal, the following aspects of the first aspect to the first aspect are implemented.
  • the embodiment of the present application provides a computer program product.
  • the program code contained in the computer program product is executed by the processor in the terminal, it can realize any one of the first to second aspects or any one of the aspects. Any one of the possible implementations describes the method.
  • the embodiment of the present application provides a system, including: any one of the third to sixth aspects or the device described in any possible implementation manner of any one of the aspects.
  • the non-free field model is used to model the acoustic transfer relationship in the space more accurately, and the acoustic transfer relationship modeled by this method has stronger
  • the anti-interference ability can improve the accuracy and robustness of speech sound source localization.
  • the above method has lower requirements on the number of sound collection devices, which can reduce the use of devices such as sound sensor arrays and audio transmission lines, and reduce hardware costs.
  • communication channels and computing resources are also reduced.
  • the computing device for modeling and applying the acoustic transfer relationship can be the same device or different devices, which is more flexible.
  • FIG. 1 is a schematic diagram of a functional framework of a vehicle provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a system architecture of a vehicle provided in an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an automobile interior provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a measurement system provided by an embodiment of the present application.
  • Fig. 5 is a schematic diagram of the installation position of a sound collection device provided by the embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a method for localizing a voice source provided by an embodiment of the present application
  • FIG. 7 is an example diagram of a suggested installation position of an acoustic sensor provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of a method for locating a voice source provided by an embodiment of the present application.
  • FIG. 9 is a schematic flowchart of a method for localizing a voice source provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a method for localizing a speech sound source according to an embodiment of the present application.
  • FIG. 11 is an example diagram of an installation method of a system 400 provided in an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of a speech sound source localization device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a speech sound source localization device provided in an embodiment of the present application.
  • Fig. 14 is a schematic structural diagram of a speech sound source localization device provided by an embodiment of the present application.
  • the speech sound source localization solution provided in the embodiment of the present application includes a speech sound source localization method, device and system. Since the principles of these technical solutions to solve problems are the same or similar, in the introduction of the following specific embodiments, some repetitions may not be repeated, but it should be considered that these specific embodiments have been referred to each other and can be combined with each other.
  • FIG. 1 is a schematic diagram of a functional framework of a vehicle 100 provided by an embodiment of the present application.
  • a vehicle 100 may include various subsystems, such as an infotainment system 110 , a perception system 120 , a decision-making control system 130 , a drive system 140 , and a computing platform 150 .
  • vehicle 100 may include more or fewer subsystems, and each subsystem may include one or more components.
  • each subsystem and component of the vehicle 100 may be interconnected in a wired or wireless manner.
  • the infotainment system 110 may include a communication system 111 , an entertainment system 112 and a navigation system 113 .
  • Communication system 111 may include a wireless communication system that may wirelessly communicate with one or more devices, either directly or via a communication network.
  • the wireless communication system 146 may use a third generation (3th generation, 3G) cellular communication technology, such as code division multiple access (code division multiple access, CDMA), or a fourth generation (4th generation, 4G) cellular communication technology, such as Long-term evolution (long time evolution, LTE) communication technology.
  • the fifth generation (5th generation, 5G) cellular communication technology such as new radio (new radio, NR) communication technology.
  • the wireless communication system may communicate with a wireless local area network (wireless local area network, WLAN) by using WiFi.
  • WLAN wireless local area network
  • the wireless communication system 146 may communicate directly with the device using an infrared link, Bluetooth, or ZigBee.
  • Other wireless protocols such as various vehicle communication systems, for example, a wireless communication system may include one or more dedicated short range communications (DSRC) devices, which may include communication between vehicles and/or roadside stations Public and/or Private Data Communications.
  • DSRC dedicated short range communications
  • the entertainment system 112 can include a central control screen, a microphone and a sound system. Users can listen to the radio and play music in the car based on the entertainment system; Touch type, users can operate by touching the screen. In some cases, the user's voice signal can be acquired through the microphone, and the user can control the vehicle 100 based on the analysis of the user's voice signal, such as adjusting the temperature inside the vehicle. In other cases, music may be played to the user via a speaker.
  • the navigation system 113 may include a map service provided by a map provider, so as to provide navigation for the driving route of the vehicle 100 , and the navigation system 113 may cooperate with the global positioning system 121 and the inertial measurement unit 122 of the vehicle.
  • the map service provided by the map provider can be a two-dimensional map or a high-definition map.
  • the perception system 120 may include several kinds of sensors that sense information about the environment around the vehicle 100 .
  • the perception system 120 may include a global positioning system 121 (the global positioning system may be a global positioning satellite (global position satellite, GPS) system, or the Beidou system or other positioning systems), an inertial measurement unit (inertial measurement unit, IMU) 122 , laser radar 123 , millimeter wave radar 124 , ultrasonic radar 125 and camera device 126 .
  • the perception system 120 may also include sensors of the interior systems of the monitored vehicle 100 (eg, interior air quality monitors, fuel gauges, oil temperature gauges, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding properties (position, shape, orientation, velocity, etc.). Such detection and identification is a critical function for safe operation of the vehicle 100 .
  • the global positioning system 121 may be used to estimate the geographic location of the vehicle 100 .
  • the inertial measurement unit 122 is used to sense the position and orientation changes of the vehicle 100 based on inertial acceleration.
  • inertial measurement unit 122 may be a combination accelerometer and gyroscope.
  • the lidar 123 may utilize laser light to sense objects in the environment in which the vehicle 100 is located.
  • lidar 123 may include one or more laser sources, a laser scanner, and one or more detectors, among other system components.
  • the millimeter wave radar 124 may utilize radio signals to sense objects within the surrounding environment of the vehicle 100 .
  • radar 126 may be used to sense the velocity and/or heading of objects.
  • the ultrasonic radar 125 may sense objects around the vehicle 100 using ultrasonic signals.
  • the camera device 126 can be used to capture image information of the surrounding environment of the vehicle 100 .
  • the camera device 126 may include a monocular camera, a binocular camera, a structured light camera, a panoramic camera, etc., and the image information acquired by the camera device 126 may include still images or video stream information.
  • the decision-making control system 130 includes a computing system 131 for analyzing and making decisions based on the information acquired by the perception system 120.
  • the decision-making control system 130 also includes a vehicle controller 132 for controlling the power system of the vehicle 100, and for controlling the steering of the vehicle 100.
  • Computing system 131 is operable to process and analyze various information acquired by perception system 120 in order to identify objects, objects, and/or features in the environment surrounding vehicle 100 .
  • the objects may include pedestrians or animals, and the objects and/or features may include traffic signals, road boundaries, and obstacles.
  • the computing system 131 may use technologies such as object recognition algorithms, structure from motion (SFM) algorithms, and video tracking. In some embodiments, computing system 131 may be used to map the environment, track objects, estimate the velocity of objects, and the like.
  • the computing system 131 can analyze various information obtained and obtain a control strategy for the vehicle.
  • the vehicle controller 132 can be used for coordinated control of the power battery and the engine 141 of the vehicle, so as to improve the power performance of the vehicle 100 .
  • the steering system 133 is operable to adjust the heading of the vehicle 100 .
  • it could be a steering wheel system.
  • the throttle 134 is used to control the operating speed of the engine 141 and thus the speed of the vehicle 100 .
  • the braking system 135 is used to control deceleration of the vehicle 100 .
  • Braking system 135 may use friction to slow wheels 144 .
  • braking system 135 may convert kinetic energy of wheels 144 into electrical current.
  • the braking system 135 may also take other forms to slow the wheels 144 to control the speed of the vehicle 100 .
  • Drive system 140 may include components that provide powered motion to vehicle 100 .
  • drive system 140 may include engine 141 , energy source 142 , transmission 143 and wheels 144 .
  • the engine 141 may be an internal combustion engine, an electric motor, an air compression engine or other types of engine combinations, such as a hybrid engine composed of a gasoline engine and an electric motor, or a hybrid engine composed of an internal combustion engine and an air compression engine.
  • Engine 141 converts energy source 142 into mechanical energy.
  • Examples of energy source 142 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electrical power.
  • the energy source 142 may also provide energy to other systems of the vehicle 100 .
  • Transmission 143 may transmit mechanical power from engine 141 to wheels 144 .
  • Transmission 143 may include a gearbox, a differential, and a drive shaft.
  • the transmission device 143 may also include other devices, such as clutches.
  • drive shafts may include one or more axles that may be coupled to one or more wheels 121 .
  • Computing platform 150 may include at least one processor 151 that may execute instructions 153 stored in a non-transitory computer-readable medium such as memory 152 .
  • computing platform 150 may also be a plurality of computing devices that control individual components or subsystems of vehicle 100 in a distributed manner.
  • the processor 151 may be any conventional processor, such as a central processing unit (central process unit, CPU). Alternatively, the processor 151 may also include, for example, an image processor (graphic process unit, GPU), a field programmable gate array (field programmable gate array, FPGA), a system on chip (system on chip, SOC), an application specific integrated chip ( application specific integrated circuit, ASIC) or their combination.
  • FIG. 1 functionally illustrates the processor, memory, and other elements of computer 110 in the same block, those of ordinary skill in the art will understand that the processor, computer, or memory may actually include Multiple processors, computers, or memories stored within the same physical enclosure.
  • the memory may be a hard drive or other storage medium located in a different housing than the computer 110 .
  • references to a processor or computer are to be understood to include references to collections of processors or computers or memories that may or may not operate in parallel.
  • some components such as the steering and deceleration components, may each have their own processor that only performs calculations related to component-specific functions .
  • the processor may be located remotely from the vehicle and be in wireless communication with the vehicle. In other aspects, some of the processes described herein are executed on a processor disposed within the vehicle while others are executed by a remote processor, including taking the necessary steps to perform a single maneuver.
  • memory 152 may contain instructions 153 (eg, program logic) executable by processor 151 to perform various functions of vehicle 100 .
  • Memory 152 may also contain additional instructions, including sending data to, receiving data from, interacting with, and/or controlling one or more of infotainment system 110 , perception system 120 , decision control system 130 , drive system 140 instructions.
  • memory 152 may also store data such as road maps, route information, the vehicle's position, direction, speed, and other such vehicle data, among other information. Such information may be used by vehicle 100 and computing platform 150 during operation of vehicle 100 in autonomous, semi-autonomous, and/or manual modes.
  • Computing platform 150 may control functions of vehicle 100 based on input received from various subsystems (eg, drive system 140 , perception system 120 , and decision-making control system 130 ). For example, computing platform 150 may utilize input from decision control system 130 in order to control steering system 133 to avoid obstacles detected by perception system 120 . In some embodiments, computing platform 150 is operable to provide control over many aspects of vehicle 100 and its subsystems.
  • various subsystems eg, drive system 140 , perception system 120 , and decision-making control system 130 .
  • computing platform 150 may utilize input from decision control system 130 in order to control steering system 133 to avoid obstacles detected by perception system 120 .
  • computing platform 150 is operable to provide control over many aspects of vehicle 100 and its subsystems.
  • one or more of these components described above may be installed separately from or associated with the vehicle 100 .
  • memory 152 may exist partially or completely separate from vehicle 100 .
  • the components described above may be communicatively coupled together in a wired and/or wireless manner.
  • FIG. 1 should not be construed as limiting the embodiment of the present application.
  • An autonomous vehicle traveling on a road can identify objects within its surroundings to determine adjustments to the current speed.
  • the objects may be other vehicles, traffic control devices, or other types of objects.
  • each identified object may be considered independently and based on the object's respective characteristics, such as its current speed, acceleration, distance to the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to adjust.
  • the vehicle 100 or a sensing and computing device (e.g., computing system 131, computing platform 150) associated with the vehicle 100 may be based on the identified characteristics of the object and the state of the surrounding environment (e.g., traffic, rain, traffic on the road) ice, etc.) to predict the behavior of the identified objects.
  • each identified object is dependent on the behavior of the other, so all identified objects can also be considered together to predict the behavior of a single identified object.
  • the vehicle 100 is able to adjust its speed based on the predicted behavior of the identified object.
  • the self-driving car can determine which state the vehicle will need to adjust to (eg, accelerate, decelerate, or stop) based on the predicted behavior of the object.
  • other factors may also be considered to determine the speed of the vehicle 100 , such as the lateral position of the vehicle 100 in the traveling road, the curvature of the road, the proximity of static and dynamic objects, and the like.
  • the computing device may also provide instructions to modify the steering angle of the vehicle 100 such that the self-driving car follows a given trajectory and/or maintains contact with objects in the vicinity of the self-driving car (e.g., , the safe lateral and longitudinal distances of cars in adjacent lanes on the road.
  • objects in the vicinity of the self-driving car e.g., , the safe lateral and longitudinal distances of cars in adjacent lanes on the road.
  • the vehicle 100 mentioned above can be a car, truck, bus, ship, airplane, helicopter, lawn mower, recreational vehicle, playground vehicle, construction equipment, tram, golf cart, train, etc. limit.
  • the vehicle 200 may include a vehicle integration unit (vehicle integration unit, VIU) 11, a communication box (telematic box, T-BOX) 12, and a cockpit domain controller (Part or all of cockpit domain controller (CDC), mobile data center (mobile data center, MDC) 14, vehicle domain controller (vehicle domain controller, VDC) 15.
  • VIU vehicle integration unit
  • T-BOX communication box
  • CDC cockpit domain controller
  • MDC mobile data center
  • VDC vehicle domain controller
  • the vehicle 200 may also be provided with various types of sensors on the vehicle body, including: laser radar 21 , millimeter wave radar 22 , ultrasonic radar 23 , and camera device 24 . It should be understood that although FIG. 2 shows the location layout of different sensors on the vehicle 200, the number and location layout of the sensors in FIG. Type, quantity and location layout.
  • VIUs are shown in FIG. 2 . It should be understood that the number and positions of VIUs in FIG. 2 are only an example, and those skilled in the art can select an appropriate number and positions of VIUs according to actual needs.
  • the vehicle integration unit VIU 11 provides some or all of the data processing functions or control functions required by the vehicle components for multiple vehicle components.
  • VIU can have one or more of the following functions.
  • Electronic control function that is, the VIU is used to realize the electronic control function provided by the electronic control unit (ECU) inside some or all of the vehicle components.
  • ECU electronice control unit
  • the control function required by a certain vehicle component and for example, the data processing function required by a certain vehicle component.
  • the same functions as the gateway, that is, the VIU can also have some or all of the same functions as the gateway, for example, the function of protocol conversion, protocol encapsulation and forwarding, and data format conversion.
  • the data involved in the above functions may include the operating data of the actuators in the vehicle components, for example, the motion parameters of the actuators, the working status of the actuators, etc.
  • the data involved in the above functions can also be the data collected by the data collection unit (for example, sensitive element) of the vehicle components, for example, the road information of the road on which the vehicle is driven, or the weather information, etc. collected by the sensitive element of the vehicle, This embodiment of the present application does not specifically limit it.
  • the vehicle 200 can be divided into multiple domains (domain), each domain has an independent domain controller (domain controller), specifically, in Fig. 2, two kinds of domains are shown Controller: cockpit domain controller CDC 13 and vehicle domain controller VDC 15.
  • the cockpit domain controller CDC 13 can be used to realize the functional control of the vehicle 200 cockpit area, and the vehicle components in the cockpit area can include head up display (head up display, HUD), instrument panel, radio, central control screen, navigation, camera, etc.
  • head up display head up display, HUD
  • instrument panel instrument panel
  • radio central control screen
  • navigation camera, etc.
  • the vehicle domain controller VDC 15 can be used to coordinate and control the power battery and the engine 141 of the vehicle to improve the power performance of the vehicle 200.
  • the vehicle controller 132 in FIG. 1 can realize various functions of the VDC. Features.
  • the T-BOX 12 can be used to realize the communication connection between the vehicle 200 and the internal and external equipment of the vehicle.
  • the T-BOX can obtain in-vehicle device data through the bus of the vehicle 200, and can also communicate with the user's mobile phone through a wireless network.
  • the T-BOX 12 can be included in the communication system 111 of FIG. 1 .
  • the mobile data center MDC 13 is used to output drive, transmission, steering and braking execution control commands based on core control algorithms such as environment perception and positioning, intelligent planning and decision-making, and vehicle motion control, so as to realize automatic control of the vehicle 200.
  • the interactive interface realizes the human-computer interaction of vehicle driving information.
  • computing platform 150 in FIG. 1 may implement various functions of MDC 13.
  • VIUs 11 in Fig. 2 form a ring topology connection network
  • each VIU 11 communicates with the sensor in its immediate vicinity
  • T-BOX 12, CDC 13, MDC 14 and VDC 15 communicate with the ring topology connection network of VIUs.
  • VIU 11 can acquire information from various sensors, and report the acquired information to CDC 13, MDC 14 and VDC 15.
  • T-BOX 12, CDC 13, MDC 14, and VDC 15 can also communicate with each other.
  • connection between VIU can adopt such as Ethernet (ethernet), and the connection of VIU and T-BOX 12, CDC 13, MDC 14 and VDC 15 can adopt such as Ethernet or fast peripheral component interconnection (peripheral component interconnect express, PCIe) Technology, the connection between VIU and sensor can adopt such as controller area network (controller area network, CAN), local interconnection network (local interconnect network, LIN), FlexRay, media oriented system transport (media oriented system transport, MOST )Wait.
  • controller area network controller area network, CAN
  • local interconnection network local interconnect network, LIN
  • FlexRay media oriented system transport
  • MOST media oriented system transport
  • vehicle 100 shown in FIG. 1 and the vehicle 200 shown in FIG. 2 may be the same vehicle or different vehicles.
  • the technical details may refer to each other, and will not be repeated here.
  • FIG. 3 is a schematic structural diagram of an interior of a car provided by an embodiment of the present application.
  • there are multiple seating areas inside the cockpit of a car which can accommodate multiple users. Different users have different locations to choose from when riding a car.
  • the user's voice information can be collected by deploying a distributed array of sound sensors in the cockpit. Since the distance between distributed arrays is generally much larger than the wavelength of sound signals, the beam positioning scheme commonly used in centralized sound sensor arrays is difficult to apply to vehicle-mounted voice interaction solutions.
  • the noise is relatively large, and the sound environment outside the car is highly uncertain.
  • the noise and the noise outside the car will be relatively large.
  • the sound environment with weak stability is easy to affect the sound signal energy received by different sound sensors, which is easy to cause misjudgment of sound source localization, and the robustness is poor.
  • an embodiment of the present application provides a solution for localizing a speech sound source, which is used to improve the accuracy and robustness of speech sound localization recognition.
  • Fig. 4 is a schematic structural diagram of a measurement system provided by an embodiment of the present application
  • Fig. 5 is a schematic diagram of an installation position of a sound collection device provided by an embodiment of the present application.
  • the system 400 includes: a sound collection device 401 and a processing device 402 , wherein the sound collection device 401 and the processing device 402 can perform data communication through wired communication or wireless communication.
  • the system 400 can be used to measure and model the sound field in a space, which can be a car cabin, or can also be an area in a room.
  • the sound collection device 401 can be used to collect sound signals in the space and obtain sound sensor information
  • the processing device 402 can be used to process the sound sensor information obtained by the sound collection device 401 to obtain the sound field in the collection space Condition.
  • One or more sound collection devices 401 may be provided, and the positions covered by different sound collection devices may be different. Taking a car as an example, as shown in FIG. 5, the sound collection device 401 includes sound collection devices 401a to 401e, and the above-mentioned sound collection devices are installed in different areas of the car, and are used to collect sound signals of the car cockpit from different positions.
  • Fig. 5 shows five sound collection devices, and the number of sound collection devices can be adjusted to be more or less during the specific implementation process.
  • the sound collection device 401 may be implemented by a sound sensor, such as a microphone, and the sound sensor may be installed in the form of a distributed array, or may also be installed in the form of multiple centralized arrays.
  • the sound collection device 401 may also be jointly implemented by a sound sensor and a sound collection card, wherein the sound collection card may be used to supply power to the sound sensor.
  • the processing device 402 may be a processing device in an automobile, such as an on-board computer, an on-board computer, an on-board processing chip, and the like. It can also be a computer, such as a computer, a processor, a processing chip, and the like.
  • the sound field in the car cockpit can be measured by playing voice in different areas in the car cockpit and collecting sound signals through the sound collection device.
  • the sound field conditions closer to the real environment can be obtained. For example, have a tester or a test machine enter the cockpit and speak from different positions. According to the position where the tester or the test machine emits the sound and the sound signal collected by the sound collection device, the sound field model in the car cockpit is established, wherein the tester or the test machine can be one or more, and the tester or the test machine can be You can speak at the same time or separately.
  • the accuracy of vehicle sound field modeling can be improved.
  • the above-mentioned system 400 also includes a sound playing device 403, and there may be one or more sound playing devices 403, which may be played simultaneously or in time division.
  • the sound playback device 403 can be realized by audio devices and the like.
  • the sound playback device 403 can also be realized by a sound playback device and a power amplifier, wherein the power amplifier can be used to adjust the output of the sound playback device.
  • the volume of the voice played For example, the power amplifier amplifies the received sound signal from the processing device 402 and outputs it to the sound playing device 403 to increase the volume of the sound played by the sound playing device 403 .
  • artificial head and artificial mouth can be used to simulate real users. This can imitate the human vocalization mechanism and create a more accurate sound field model.
  • FIG. 6 is a schematic flowchart of a method for localizing a voice source provided by an embodiment of the present application. It should be understood that the method can be performed by an electronic device, wherein the electronic device can be a complete computing device, for example, a vehicle device such as a car or a machine, a smart home device such as a smart audio, a smart TV, or a Some components of computing equipment, such as chips in cars, processors or controllers of sound sensors, etc. It can also be executed by the systems shown in FIG. 1 , FIG. 2 and FIG. 4 . In the following, the implementation of the system 400 shown in FIG. 4 is taken as an example to expand the introduction.
  • the space includes a first area and a second area, and the first area and the second area are different areas.
  • the space may also include other areas.
  • the first region and the second region are used as examples below. In the first zone and the second zone within the space, audio is played.
  • method 600 includes:
  • S601 Acquire sound sensing information in the space, where the sound sensing information is determined by multiple sound sensors in the space.
  • multiple sound collection devices 401 are arranged in the space for localizing the sound source of the speech.
  • the positions of the multiple sound sensors remain unchanged in the space.
  • the sound sensor in the car can be fixed in the cockpit, and the position of the sound sensor is not adjusted when the acoustic transfer relationship is established and after the acoustic transfer relationship is established. If the position of the sound sensor used for voice source localization is adjusted, it is necessary to re-test and establish the acoustic transfer relationship.
  • the sound sensor may be a centralized sound sensor or a distributed sound sensor.
  • the sound collecting device 401 can collect the sound in the space, pre-process it or directly send it to the processing device 402 for processing. Furthermore, the processing device 402 may acquire sound sensor information in the space.
  • S602 Determine the acoustic transfer relationship of the space according to the sound sensing information, wherein the acoustic transfer relationship includes the acoustic transfer relationship of the first area and the acoustic transfer relationship of the second area, and the acoustic transfer relationship of the first area The acoustic transfer relationship with the second area is used to determine the location of the sound source in the space.
  • the processing device 402 may determine the acoustic transfer relationship of the space according to the acquired sound sensing information. That is, the acoustic transfer relationship of the first area and the acoustic transfer relationship of the second area. Assume that the space is provided with a total of I sound sensors for voice sound source localization, wherein, I is a positive integer greater than or equal to 2. For example, the acoustic transfer relationship between the first area and the first sound sensor, the acoustic transfer relationship between the first area and the i-th sound sensor, the acoustic transfer relationship between the second area and the first sound sensor, and the acoustic transfer relationship between the second area and the i-th sound sensor Acoustic transfer relationship.
  • the acoustic transfer relationship may be a function, a formula, a table, a corresponding relationship, etc., and this application does not limit the form.
  • the location of the acoustic sensor can be determined according to the location of the area to be located. For example, when there are two sound sensors, try to satisfy different areas to be located and not be symmetrical about the plane formed by the sound sensors.
  • FIG. 7 is an example diagram of a suggested installation position of an acoustic sensor provided in an embodiment of the present application. As shown in FIG. 7 , the sound sensor 1 and the sound sensor 2 are arranged in front of the main driving area and the passenger driving area.
  • the sound sensing information includes first sound sensing information and second sound sensing information
  • the first sound sensing information includes sound sensor information determined by the plurality of sound sensors when the audio is played in the first area.
  • the second sound sensory information includes sound sensory information determined by the plurality of sound sensors when the audio is played in the second area.
  • the same sound as the same sound in the plurality of sound sensors except the I-1 sound sensors is used to determine the acoustic transfer relationship of the determined first area, and the frequency domain information and the frequency domain information of the second sound sensing information determined by the I-1 sound sensors.
  • the difference between the frequency domain information of the second sound sensing information determined by the same sound sensor determines the acoustic transfer relationship of the determined second area.
  • the audio played in the first area and the second area may be the same, so as to facilitate the establishment of an acoustic transfer relationship.
  • the difference between the sound sensing information collected by different sound sensors and the frequency domain information of the sound sensing information collected by the same sound sensor is used to determine the acoustic transfer relationship of the area.
  • the difference between the sound sensing information collected by different sound sensors and the frequency domain information of the sound sensing information collected by the same sound sensor is also used to determine the acoustic transfer relationship of the area.
  • the difference between the frequency domain information of the sound sensor information collected by the 2nd to the 1st sound sensor and the sound sensor information collected by the 1st sound sensor is adopted, To determine the acoustic transfer relationship of each region, and also use the sound sensor information collected by the 2nd to 1st sound sensor and the sound sensor collected by the 1st sound sensor when applying the sound sensor information The difference between the frequency domain information of the information is used to localize the sound source of the speech.
  • the frequency domain information of the sound sensor information collected by the 1st to I-1 sound sensors and the sound sensor information collected by the I sound sensor is used.
  • the sound sensing information collected by the 1st to I-1 sound sensors and the sound sensor collected by the I-th sound sensor are also used.
  • the difference between the frequency domain information of the received sound sensor information is used to locate the voice source.
  • the method further includes:
  • the audio can be acquired by a sound sensor placed next to the sound playback device 403 for the establishment of the acoustic transfer relationship. Acquiring the source signal (that is, the audio signal) of the sound playback device 403 directly to establish the acoustic transfer relationship can further improve the accuracy of the acoustic transfer relationship and the accuracy of the voice source localization.
  • the acoustic transfer relationship in the space may be determined according to the frequency domain information of the sound sensing information and the frequency domain information of the audio.
  • the audio includes a first audio and a second audio, and the first audio and the second audio may be the same or different.
  • the sound sensing information includes first sound sensing information and second sound sensing information, the first sound sensing information is the sound sensing information obtained when the first audio is played in the first area, and the second sound sensing information The information is sound sensing information obtained when the second audio is played in the second area. Determine the acoustic transmission relationship of the first region according to the ratio of the first sound sensing information to the first audio frequency, and determine the acoustic transmission relationship of the second region according to the ratio of the second sound sensing information to the second audio frequency relation.
  • the number of acoustic sensors can be reduced, reducing hardware costs and computing costs.
  • FIG. 8 is a schematic flowchart of a method for localizing a voice source provided by an embodiment of the present application. It should be understood that the method can be performed by an electronic device, wherein the electronic device can be a complete computing device, for example, a vehicle device such as a car or a machine, a smart home device such as a smart audio, a smart TV, or a Some components of computing equipment, such as chips in cars, processors or controllers of sound sensors, etc. It can also be executed by the systems shown in FIG. 1 , FIG. 2 and FIG. 4 . In the following, the implementation of the system 400 shown in FIG. 4 is taken as an example to expand the introduction.
  • the space includes a first area and a second area, and the first area and the second area are different areas.
  • the space may also include other areas.
  • the first region and the second region are used as examples below.
  • method 800 includes:
  • S801 Acquire sound sensor information of a first voice, where the sound sensor information is determined by multiple sound sensors.
  • multiple sound collection devices 401 are arranged in the space for localizing the sound source of the speech.
  • the positions of the multiple sound sensors remain unchanged in the space.
  • the sound sensor in the car can be fixed in the cockpit, and the position of the sound sensor is not adjusted when the acoustic transfer relationship is established and after the acoustic transfer relationship is established. If the position of the sound sensor used for voice source localization is adjusted, it is necessary to re-test and establish the acoustic transfer relationship.
  • the sound sensor may be a centralized sound sensor or a distributed sound sensor.
  • S802 According to the sound sensor information and the acoustic transfer relationship, determine the sound source position of the first voice from multiple areas in the space, where the acoustic transfer relationship is used to indicate that it is played in one or more areas in the space During audio, the transfer relationship between the sound sensing information collected by the multiple sound sensors and the one or more regions, the acoustic transfer relationship is predetermined based on the non-free field condition.
  • the sound sensing information and the acoustic transfer relationship determine the power sum of multiple areas in the space, and determine the area corresponding to the maximum power sum in the multiple areas as the sound source of the first voice Location. For example, if five sound sensors are set in the space, and the area corresponding to the maximum power sum is the first area, then the position of the sound source of the first voice can be located as the first area.
  • a threshold is set, and when the power sum of the existing area is greater than the threshold, it is determined that there is a sound source in the area.
  • the number of acoustic sensors can be reduced, reducing hardware costs and computing costs.
  • FIG. 9 is a schematic flowchart of a method for localizing a speech sound source according to an embodiment of the present application. It should be understood that the method may be performed by an electronic device, wherein the electronic device may be a complete computing device, for example, a car, a vehicle-mounted device, etc., or a part of a device applied to a computing device, such as a vehicle-mounted device chip in the chip, the processor or controller of the sound sensor, etc. It can also be executed by the systems shown in FIG. 1 , FIG. 2 and FIG. 4 . In the following, the implementation of the system 400 shown in FIG. 4 is taken as an example to expand the introduction. Referring to FIG. 9, the method 900 includes:
  • S901 Play a first sound signal in a first area.
  • the sound playing device 403 can play audio, such as the first sound signal, in different areas in the space (area 1 to area 5 as shown in FIG. 3 ).
  • the first area may be one area, or may be multiple areas.
  • the first sound signal can be set as a broadband sound signal, for example, white noise or pink noise.
  • the general frequency range of the broadband sound signal may be 50 Hz (Hertz, Hz) to 4000 Hz, or 50 Hz to 2000 Hz, or 20 Hz to 20000 Hz, and so on. It should be understood that the value of the broadband sound signal here is only an example, which is not limited in the present application.
  • the broadband sound signal here may be preset, or generated by the processing device 402 and sent to the sound playing device 403 .
  • the processing device 402 may also record the playing time information and area information of the above-mentioned first sound signal, in various forms such as tables, functions, time stamps, correspondences, etc., which is not limited in this application.
  • the area information may be one or more of the number of the area, the position of the area and other information.
  • the time stamp may be at the microsecond level, and the processing device 402 may perform signal alignment and synchronization according to the collected sound sensor information and the time stamp of the first sound signal.
  • S902 Acquire sound sensing information.
  • the sound collection device 401 can collect sound signals in a space, for example, sound signals in a cockpit.
  • the sound collection device 401 may also perform one or more operations of storing, processing, and sending the above-mentioned sound signal, and send the sound signal to the processing device 402, and the processing device 402 performs subsequent processing.
  • S903 Determine an acoustic transfer relationship of the first region according to the sound sensing information and the first sound signal.
  • the processing device 403 can determine the acoustic transmission relationship of the region according to the corresponding relationship between the sound sensing information and the first sound signal. For example, the acoustic transfer relationship of different seat areas in a car, and the acoustic transfer relationship of different areas in a room.
  • the form of the acoustic transfer relationship may be a table, a function, a corresponding relationship, etc., which is not limited in this application.
  • the above-mentioned acoustic transfer relationship can be used for the positioning of the acoustic transfer relationship.
  • each area to be located has an acoustic transfer relationship.
  • the sound sensing information can be obtained, and according to the sound transmission Sensitive information is used to determine the power sum of different areas, and then determine the location of the voice source.
  • An exemplary calculation method and application method of the acoustic transfer relationship are given below.
  • An array of sound sensors is set in the space to be measured, which contains a total of I microphone units, where I is a positive integer greater than 1.
  • I is a positive integer greater than 1.
  • the frequency domain signal corresponding to the sound signal emitted by the sound playback device in this region is denoted as X m (f), where 0 ⁇ m ⁇ M.
  • the acoustic transfer relationship corresponding to the i-th sensor unit in the m-th region is denoted as Wherein, 0 ⁇ i ⁇ I and 0 ⁇ I.
  • the frequency-domain signal corresponding to the sound signal received by the i-th sensor unit in the m-th area is denoted as in,
  • X m (f) are available information
  • the acoustic transfer relationship of the m-th area corresponding to the i-th sensor unit can be determined according to the following formula:
  • the power sum of the mth region is determined as Determine the power greater than the preset threshold and the corresponding area position as the sound source position, or determine the maximum power and the corresponding area position as the sound source position.
  • FIG. 10 is a schematic flowchart of a method for localizing a speech sound source according to an embodiment of the present application. It should be understood that the method may be performed by an electronic device, wherein the electronic device may be a complete computing device, for example, a car, a vehicle-mounted device, etc., or a part of a device applied to a computing device, such as a vehicle-mounted device chip in the chip, the processor or controller of the sound sensor, etc. It can also be executed by the systems shown in FIG. 1 , FIG. 2 and FIG. 4 . In the following, the implementation of the system 400 shown in FIG. 4 is taken as an example to expand the introduction. Referring to FIG. 10, the method 1000 includes:
  • S1001 Play a first sound signal in a first area and a second area respectively, where the first area is different from the second area.
  • the sound playing device 403 can play audio, such as the first sound signal, in different areas in the space (area 1 to area 5 shown in FIG. 3 ), wherein the first area and the second area are different.
  • the first sound signal can be set as a white noise signal.
  • the white noise signal here may be preset, or generated by the processing device 402 and sent to the sound playing device 403 .
  • the processing device 402 may also record the playing time information and area information of the above-mentioned first sound signal, in various forms such as tables, functions, time stamps, correspondences, etc., which is not limited in this application. Wherein, the time information may be at the second level, or at the millisecond level.
  • the area information may be one or more of area number, area position, size and other information.
  • S1002 Acquire sound sensing information, where the sound sensing information includes first sound sensing information corresponding to the first area and second sound sensing information corresponding to the second area.
  • the sound playback device 403 plays sound signals in the first area and the second area respectively
  • the sound information in the control is collected by the sound collection device 401, and the acquired sensing information is respectively recorded as the first sound sensing information and Second sound sensor information.
  • S1003 According to the sound sensing information and the first sound signal, determine the difference in the acoustic transfer relationship between the first area and the second area.
  • the processing device 403 may determine the difference in the acoustic transmission relationship between different regions according to the corresponding relationship between the sound sensing information and the first sound signal. For example, the acoustic transfer relationship of different seat areas in a car, and the acoustic transfer relationship of different areas in a room.
  • the space to be measured includes M areas, and is provided with a distributed acoustic sensor array, wherein a total of I microphone units are included, and I is a positive integer greater than 1.
  • the frequency domain signal corresponding to the sound signal emitted by the sound playback device in this region is denoted as X(f), where 0 ⁇ m ⁇ M.
  • the acoustic transfer relationship corresponding to the i-th group of sensor units in the m-th region is denoted as Wherein, 0 ⁇ i ⁇ I and 0 ⁇ I.
  • the frequency-domain signal corresponding to the sound signal received by the i-th group of sensor units in the m-th area is denoted as in,
  • due to is available information
  • X(f) is unknown information.
  • the difference in energy between regions can be determined using sound signals obtained from different channels formed by different distributed sound sensor arrays. Specifically, for the mth area, the difference between the frequency domain signals corresponding to the second channel to the first channel and the frequency domain signal corresponding to the first channel satisfies the following relationship:
  • the Referred to as Can be used to assess differences in acoustic transfer relationships and energy differences between different regions.
  • 2 ⁇ i ⁇ I the difference in the acoustic transfer relationship between the regions can be determined according to the difference in the sound signals received by different sensor receiving channels.
  • the power sum of the mth region is determined as Determine the power greater than the preset threshold and the corresponding area position as the sound source position, or determine the maximum power and the corresponding area position as the sound source position.
  • FIG. 11 is an example diagram of an installation manner of a system 400 provided in an embodiment of the present application.
  • the devices involved in the system 400 are installed inside and outside the car, and the artificial mouth is placed in the sedentary position of the passenger, such as one or more of areas 1 to 5 shown in FIG. 3 .
  • the processing device generates a white noise signal, one way is sent to the artificial mouth through the power amplifier and played by the artificial mouth, and the other way is sent to the sound acquisition card.
  • the sound acquisition card can record the sound signal collected by the distributed array and the white noise signal sent by the processing device (also can be understood as the sound signal sent by the artificial mouth).
  • Fast Fourier transform is performed on the sound signals of different channels collected by the distributed array, and the frequency domain signals are obtained.
  • the acoustic transfer relationship of different areas in the cockpit is determined. For example, the acoustic transfer relationship in each seating zone is calculated according to the manner described in method 900 .
  • the devices involved in the system 400 are installed inside and outside the car, and the artificial mouth is placed in the position where the passenger sits for a long time, as shown in area 1 to area 3 in FIG. 3 .
  • the processing means generates a white noise signal which is sent to the artificial mouth only through the power amplifier and played by the artificial mouth.
  • the sound acquisition card can record the sound signals collected by the distributed array, perform fast Fourier transform on the sound signals of different channels collected by the distributed array, and obtain frequency domain signals. According to the above frequency domain signal and white noise signal, the acoustic transfer relationship of different areas in the cockpit is determined. For example, the acoustic transfer relationship in each seating zone is calculated according to the manner described in method 1000 .
  • the non-free field model is used to model the acoustic transfer relationship of the sound field in the space more accurately. Strong anti-interference ability can improve the accuracy and robustness of voice source localization. For the same car model, only one measurement in different areas is required, instead of one measurement for each vehicle, and the measurement cost is low.
  • the above method has lower requirements on the number of sound collection devices, which can reduce the use of devices such as sound sensor arrays and audio transmission lines, and reduce hardware costs. At the same time, communication channels and computing resources are also reduced.
  • Fig. 12 is a schematic structural diagram of a speech sound source localization device provided by an embodiment of the present application.
  • the voice and sound source localization device 1200 may be an electronic device in the embodiment of the present application, wherein the electronic device may be a complete computing device, for example, vehicle-mounted devices such as cars and machines, smart home devices such as smart speakers and smart TVs, It can also be part of devices applied to computing equipment, for example, chips in vehicles, processors or controllers of sound sensors, etc.
  • FIG. 6 , FIG. 8 , FIG. 9 and FIG. 10 as well as the above optional embodiments can be implemented. As shown in FIG.
  • the speech sound source localization device 1200 includes: a processor 1201 , and a memory 1202 coupled with the processor 1201 . It should be understood that although only one processor and one memory are shown in FIG. 12 .
  • the speech sound source localization apparatus 1200 may include other numbers of processors and memories.
  • the memory 1202 is used to store computer programs or computer instructions. These computer programs or instructions can be divided into two categories according to their functions.
  • the speech sound source localization apparatus 1200 implements the steps in the speech sound source localization method of the embodiment of the present application.
  • Such computer programs or instructions can be recorded as positioning function programs.
  • the positioning function program may include program codes for realizing the method for localizing the voice and sound source described in one or more figures in FIG. 6 , FIG. 8 , FIG. 9 and FIG. 10 .
  • processor 1101 and memory 1102 may be implemented instead by a processing unit and a storage unit, wherein the processing unit and the storage unit may be implemented by codes having corresponding functions.
  • the storage unit is used to store program instructions; the processing unit is used to execute the program instructions in the storage unit, so as to realize the relevant speech sound source localization method shown in any one of Fig. 6, Fig. 8, Fig. 9 and Fig. 10, and the above-mentioned Various alternative embodiments.
  • Fig. 13 is a schematic structural diagram of a speech sound source localization device provided by an embodiment of the present application.
  • the voice and sound source localization device may be the electronic device in the embodiment of the present application, wherein the electronic device may be a complete computing device, for example, vehicle-mounted devices such as cars and machines, smart home devices such as smart speakers and smart TVs, or It can be part of devices applied to computing equipment, for example, chips in cars, processors or controllers of sound sensors, etc.
  • FIG. 6 , FIG. 8 , FIG. 9 and FIG. 10 as well as the above optional embodiments can be implemented. As shown in FIG.
  • the speech sound source localization device 1300 includes: a processor 1301 , and an interface circuit 1302 coupled with the processor 1301 . It should be understood that although only one processor and one interface circuit are shown in FIG. 13 .
  • the speech sound source localization device 1300 may include other numbers of processors and interface circuits.
  • the interface circuit 1302 is used to communicate with other components of the electronic device, such as memory or other processors.
  • the processor 1301 is used to perform signal interaction with other components through the interface circuit 1302 .
  • the interface circuit 1302 may be an input/output interface of the processor 1301 .
  • the processor 1301 reads computer programs or instructions in the memory coupled to it through the interface circuit 1302, and decodes and executes these computer programs or instructions.
  • these computer programs or instructions may include the above-mentioned positioning function program, and may also include the above-mentioned function program of the voice and sound source localization device applied in the electronic device.
  • the electronic device or the speech sound source localization device in the electronic device can realize the solution in the speech sound source localization method provided by the embodiment of the present application.
  • these localization function programs are stored in a memory outside the speech sound source localization device 130 .
  • the positioning function program is decoded and executed by the processor 1301, part or all of the content of the positioning function program is temporarily stored in the memory.
  • these localization function programs are stored in a memory inside the speech sound source localization device 1300 .
  • the voice and sound source locating device 1300 can be set in the car or smart home in the embodiment of the present application.
  • part of the content of these positioning function programs is stored in a memory outside the device 1300 for locating a voice and sound source, and other parts of the content of these positioning function programs are stored in a memory inside the device 1300 for locating a voice and sound source.
  • Fig. 14 is a schematic structural diagram of a speech sound source localization device provided by an embodiment of the present application.
  • the voice and sound source localization device may be the electronic device in the embodiment of the present application, wherein the electronic device may be a complete computing device, for example, vehicle-mounted devices such as cars and machines, smart home devices such as smart speakers and smart TVs, or It can be part of devices applied to computing equipment, for example, chips in cars, processors or controllers of sound sensors, etc.
  • FIG. 6 , FIG. 8 , FIG. 9 and FIG. 10 as well as the above optional embodiments can be implemented. As shown in FIG.
  • the speech sound source localization device 1400 includes: a processing unit 1401 , and a transceiver unit 1402 coupled to the processing unit 1401 . It should be understood that although only one processing unit and one transceiver unit are shown in FIG. 14 .
  • the speech sound source localization apparatus 1400 may include other numbers of processing units and transceiver units.
  • the processing unit 1401 can be used to realize the method described in one or more drawings in FIG. 6, FIG. 8, FIG. 9 and FIG. In order to implement the methods described in one or more figures in FIG. 6 , FIG. 8 , FIG. 9 and FIG. 10 , and the acquisition actions in the above optional embodiments.
  • the processing unit 1401 may be used to execute S602, and the transceiver unit 1402 may be used to execute S601 and S603.
  • the processing unit 1401 may be used to perform S802, and the transceiver unit 1402 may be used to perform S801.
  • the processing unit 1401 may be used to execute S901 and S903, and the transceiver unit 1402 may be used to execute S902.
  • the processing unit 1401 may be used to execute S1001 and S1003, and the transceiver unit 1402 may be used to execute S1002.
  • the speech sound source localization device in the embodiment of the present application may also be implemented by hardware.
  • the processing unit 1401 may be implemented by the processor 1301
  • the transceiver unit 1402 may be implemented by the interface circuit 1302 .
  • the transceiver unit 1402 may be the same physical entity or different physical entities. For example, when the transceiver units are different physical entities, they may be referred to as a receiver and a transmitter. When the transceiver units are the same physical entity, they may be collectively referred to as transceiver units or transceivers.
  • the speech sound source localization device in the embodiment of the present application can be implemented by software, for example, a computer program or instruction with the above-mentioned functions, and the corresponding computer program or instruction can be stored in the internal memory of the electronic device, through processing The device reads the corresponding computer programs or instructions inside the memory to realize the above functions.
  • the speech sound source localization device in the embodiment of the present application may also be implemented by a combination of a processor and a software module.
  • FIGS. 1 to 2 and the speech sound source localization device shown in any one of FIGS. 12 to 14 can be combined with each other, and the vehicle architecture shown in FIGS.
  • the speech sound source localization device and the relevant design details of each optional embodiment can refer to each other, and can also refer to the speech sound source localization method shown in any one of Fig. 6, Fig. 8, Fig. 9 and Fig. 10 and the relevant design of each optional embodiment detail. It will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Traffic Control Systems (AREA)
  • Navigation (AREA)

Abstract

Un procédé, un appareil (1200, 1300, 1400) et un système (400) de localisation de source sonore vocale peuvent être utilisés pour localiser une source vocale dans un espace. Le procédé de localisation de source sonore vocale consiste à : acquérir des informations de détection de sons d'une première parole qui est déterminée au moyen d'une pluralité de capteurs de sons (S601, S801) ; et selon les informations de détection de sons et une relation de transfert acoustique, déterminer une position de source sonore de la première parole à partir d'une pluralité de zones d'un espace, la relation de transfert acoustique étant utilisée pour représenter, lorsque l'audio est reproduit dans une ou plusieurs zones de l'espace, une relation de transfert entre les informations de détection de sons qui sont collectées par la pluralité de capteurs de sons et une ou plusieurs zones, et la relation de transfert acoustique est prédéterminée sur la base d'une condition de champ non libre (S802).
PCT/CN2021/107616 2021-07-21 2021-07-21 Procédé, appareil et système de localisation de source sonore vocale Ceased WO2023000206A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/107616 WO2023000206A1 (fr) 2021-07-21 2021-07-21 Procédé, appareil et système de localisation de source sonore vocale
CN202180007542.XA CN116368398A (zh) 2021-07-21 2021-07-21 语音声源定位方法、装置及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/107616 WO2023000206A1 (fr) 2021-07-21 2021-07-21 Procédé, appareil et système de localisation de source sonore vocale

Publications (1)

Publication Number Publication Date
WO2023000206A1 true WO2023000206A1 (fr) 2023-01-26

Family

ID=84979791

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107616 Ceased WO2023000206A1 (fr) 2021-07-21 2021-07-21 Procédé, appareil et système de localisation de source sonore vocale

Country Status (2)

Country Link
CN (1) CN116368398A (fr)
WO (1) WO2023000206A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116125385A (zh) * 2023-02-06 2023-05-16 佛山市云米电器科技有限公司 基于wifi的室内语音定位方法及装置
WO2024164174A1 (fr) * 2023-02-08 2024-08-15 华为技术有限公司 Procédé de commande, procédé de lecture audio et appareil associé, système et véhicule

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105895112A (zh) * 2014-10-17 2016-08-24 杜比实验室特许公司 面向用户体验的音频信号处理
US9813810B1 (en) * 2016-01-05 2017-11-07 Google Inc. Multi-microphone neural network for sound recognition
CN108141691A (zh) * 2015-10-14 2018-06-08 华为技术有限公司 自适应混响消除系统
CN109061567A (zh) * 2018-08-15 2018-12-21 广东海洋大学 多源环境下的语音精确定位方法
CN112346012A (zh) * 2020-11-13 2021-02-09 南京地平线机器人技术有限公司 声源位置确定方法和装置、可读存储介质、电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106125048B (zh) * 2016-07-11 2019-05-24 浙江大华技术股份有限公司 一种声源定位方法及装置
KR101519104B1 (ko) * 2008-10-30 2015-05-11 삼성전자 주식회사 목적음 검출 장치 및 방법
US9197974B1 (en) * 2012-01-06 2015-11-24 Audience, Inc. Directional audio capture adaptation based on alternative sensory input
CN108089152B (zh) * 2016-11-23 2020-07-03 杭州海康威视数字技术股份有限公司 一种设备控制方法、装置及系统
US11526589B2 (en) * 2019-07-30 2022-12-13 Meta Platforms Technologies, Llc Wearer identification based on personalized acoustic transfer functions
CN111025233B (zh) * 2019-11-13 2023-09-15 阿里巴巴集团控股有限公司 一种声源方向定位方法和装置、语音设备和系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105895112A (zh) * 2014-10-17 2016-08-24 杜比实验室特许公司 面向用户体验的音频信号处理
CN108141691A (zh) * 2015-10-14 2018-06-08 华为技术有限公司 自适应混响消除系统
US9813810B1 (en) * 2016-01-05 2017-11-07 Google Inc. Multi-microphone neural network for sound recognition
CN109061567A (zh) * 2018-08-15 2018-12-21 广东海洋大学 多源环境下的语音精确定位方法
CN112346012A (zh) * 2020-11-13 2021-02-09 南京地平线机器人技术有限公司 声源位置确定方法和装置、可读存储介质、电子设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116125385A (zh) * 2023-02-06 2023-05-16 佛山市云米电器科技有限公司 基于wifi的室内语音定位方法及装置
WO2024164174A1 (fr) * 2023-02-08 2024-08-15 华为技术有限公司 Procédé de commande, procédé de lecture audio et appareil associé, système et véhicule

Also Published As

Publication number Publication date
CN116368398A (zh) 2023-06-30

Similar Documents

Publication Publication Date Title
US12080160B2 (en) Feedback performance control and tracking
CN113596705B (zh) 一种发声装置的控制方法、发声系统以及车辆
US20240202401A1 (en) Test method and system
US12420644B2 (en) Information processing apparatus, information processing method, and program
CN115330923B (zh) 点云数据渲染方法、装置、车辆、可读存储介质及芯片
CN115042821B (zh) 车辆控制方法、装置、车辆及存储介质
CN115205365A (zh) 车辆距离检测方法、装置、车辆、可读存储介质及芯片
CN115100377B (zh) 地图构建方法、装置、车辆、可读存储介质及芯片
CN115314526A (zh) 用于自车位置识别的系统架构、传输方法、车辆、介质及芯片
WO2023000206A1 (fr) Procédé, appareil et système de localisation de source sonore vocale
CN115205848A (zh) 目标检测方法、装置、车辆、存储介质及芯片
CN115170630A (zh) 地图生成方法、装置、电子设备、车辆和存储介质
CN115056784B (zh) 车辆控制方法、装置、车辆、存储介质及芯片
CN115202234A (zh) 仿真测试方法、装置、存储介质和车辆
CN115051723A (zh) 车载天线装置、车载远程通信终端、车载通信系统及车辆
CN114842454A (zh) 障碍物检测方法、装置、设备、存储介质、芯片及车辆
CN115303238B (zh) 辅助刹车和鸣笛方法、装置、车辆、可读存储介质及芯片
CN115930955A (zh) 导航数据传输方法、装置、存储介质和车辆
CN115221151A (zh) 车辆数据的传输方法、装置、车辆、存储介质及芯片
CN115179930B (zh) 车辆控制方法、装置、车辆及可读存储介质
CN115334109A (zh) 用于交通信号识别的系统架构、传输方法,车辆,介质及芯片
CN115082898A (zh) 障碍物检测方法、装置、车辆及存储介质
CN115297434B (zh) 服务调用方法、装置、车辆、可读存储介质及芯片
CN115063639B (zh) 生成模型的方法、图像语义分割方法、装置、车辆及介质
CN115221260B (zh) 数据处理方法、装置、车辆及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21950466

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21950466

Country of ref document: EP

Kind code of ref document: A1