DOA error correction through feedback of the speech enhancement quality.
Three ROS2 nodes are provided:
-
demucs
: a variation of the original Demucs Denoiser model, that uses a location-based strategy for target selection. It subscribes to thejackaudio
topic that is published by thebeamform2
ROS2 node, and publishes thejackaudio_filtered
topic that is the result of enhancing the speech from the beamforming output. -
online_sqa
: a SQUIM-based online speech quality estimator. It subscribes to thejackaudio_filtered
topic, and publishes theSDR
topic that is the speech quality estimation from the enhanced speech. -
doaoptimizer
: it aims to optimize the speech quality by correcting the direction of arrival (DOA) that is fed to a beamformer, based on the Adam optimizer. It subscribes to theSDR
topic, and publishes a thetheta
topic that is the corrected DOA.
The theta
topic is then subscribed to by the beamform2
node, closing the feedback loop.
-
Install and configure
jackaudio
. -
Clone and compile the
beamform2
ROS2 node. -
Configure the
beamform_config.yaml
ofbeamform2
so that it matches your microphone setup. -
Configure the
rosjack_config.yaml
ofbeamform2
so that:- Its output is fed through a ROS2 topic:
output_type
should be either0
or2
. - Its sampling rate matches the one that
demucs
was trained with:ros_output_sample_rate
should be16000
.
- Its output is fed through a ROS2 topic:
-
Install the python requirements of all the nodes:
pip install -r requirements.txt
-
Create a ROS2 package, place the
demucs
,doaoptimizer
andonline_sqa
directories inside the package'ssrc
directory, and runcolcon build
.
-
Start the
jackaudio
server. -
Start the
phase
beamformer frombeamform2
:ros2 launch beamform2 phase.launch
-
Run
demucs
:ros2 run demucs demucs
-
Start
jack_write
frombeamform2
to listen to the result:ros2 launch beamform2 rosjack_write.launch
-
Run
online_sqa
:ros2 run online_sqa online_sqa
-
Run
doaoptimizer
:ros2 run doaoptimizer doaoptimizer
If you have an initial DOA estimate of the source of interest (SOI), you can provide it to doaoptimizer
through the init_doa
parameter. For example, if the SOI is located at 20 degrees, run doaoptimizer
with:
ros2 run doaoptimizer doaoptimizer --ros-args -p init_doa:=20.0
Also, the default learning rate is set at 0.15, since it provided good results in our tests. However, if you want to change it at run-time, you can do so through the eta
parameter. For example, if the SOI is located at 20 degrees, and you want to have a learning rate of 0.1, run doaoptimizer
with:
ros2 run doaoptimizer doaoptimizer --ros-args -p init_doa:=20.0 -p eta:=0.1
The jackaudio_filtered
topic provides the DOA-corrected enhanced speech.
If you end up using this software, please credit it as:
@article{rascon2024direction,
title={Direction of Arrival Correction through Speech Quality Feedback},
author={Rascon, Caleb},
journal={Digital Signal Processing},
pages={104960},
year={2024},
publisher={Elsevier}
}
You can also have a look at its arxiv version:
@article{rascon2024direction,
title={Direction of Arrival Correction through Speech Quality Feedback},
author={Rascon, Caleb},
journal={arXiv preprint arXiv:2408.07234},
year={2024}
}