Results on VCTK Demand

Model PESQ CSIG CBAK COVL STOI Lat.
(%) (ms)
Noisy 1.97 3.34 2.44 2.63 92.1 -
NSNet2 (Braun et al. 2020) 2.47 3.23 2.99 2.9 90.3 20
FullSubNet+ (Chen et al. 2022) 2.88 3.86 3.42 3.57 94.0 32
FRCRN (Zhao et al. 2022) 3.21 4.26 3.64 3.73 - 40
DeepFilterNet2 (Schroter et al. 2022) 3.08 4.3 3.4 3.7 94.3 40
DeepFilterNet3 (Scroter et al. 2023) 3.17 4.34 3.61 3.77 94.4 40
DeepFilterNet3* 3.20 4.37 3.56 3.80 94.7 40
HSTN-Small* 3.04 4.17 3.53 3.61 93.8 20

* denotes the models were trained in-house.

Window Design for Dereverberation

DNSMOS OVRL, SIG, and BAK scores for the model trained with different dereverberation parameters tested under close and distant microphone scenarios in small and large rooms.

Audio Examples

Examples from the dereverberation grid. [Link]. They are provided as a PPT as there are many examples for each case. Note that these models are trained only on the VCTK Clean Speech dataset.

Comparison with State-of-the-art

Audio Examples

Example 1

Female voice, T60=1.0s; Distance to mic. is 10 meters

Ground Truth

Noisy and Reverberant

DeepFilterNet3

DeepFilterNet3-d.m.

NSNet2

FullSubNet+

HS-TasNet-d.m.


Example 2

Male voice, T60=1.0s; Distance to mic. is 4 meters

Ground Truth

Noisy and Reverberant

DeepFilterNet3

DeepFilterNet3-d.m.

NSNet2

FullSubNet+

HS-TasNet-d.m.


Example 3

Female voice, T60=1.0s; Distance to mic. is 4 meters

Ground Truth

Noisy and Reverberant

DeepFilterNet3

DeepFilterNet3-d.m.

NSNet2

FullSubNet+

HS-TasNet-d.m.


Example 4

Male voice, T60=1.0s; Distance to mic. is 10 meters

Ground Truth

Noisy and Reverberant

DeepFilterNet3

DeepFilterNet3-d.m.

NSNet2

FullSubNet+

HS-TasNet-d.m.