Document

Results on VCTK Demand

Model	PESQ	CSIG	CBAK	COVL	STOI	Lat.
Model	PESQ	CSIG	CBAK	COVL	(%)	(ms)
Noisy	1.97	3.34	2.44	2.63	92.1	-
NSNet2 (Braun et al. 2020)	2.47	3.23	2.99	2.9	90.3	20
FullSubNet+ (Chen et al. 2022)	2.88	3.86	3.42	3.57	94.0	32
FRCRN (Zhao et al. 2022)	3.21	4.26	3.64	3.73	-	40
DeepFilterNet2 (Schroter et al. 2022)	3.08	4.3	3.4	3.7	94.3	40
DeepFilterNet3 (Scroter et al. 2023)	3.17	4.34	3.61	3.77	94.4	40
DeepFilterNet3*	3.20	4.37	3.56	3.80	94.7	40
HSTN-Small*	3.04	4.17	3.53	3.61	93.8	20

* denotes the models were trained in-house.

Window Design for Dereverberation

DNSMOS OVRL, SIG, and BAK scores for the model trained with different dereverberation parameters tested under close and distant microphone scenarios in small and large rooms.

Audio Examples

Examples from the dereverberation grid. [Link]. They are provided as a PPT as there are many examples for each case. Note that these models are trained only on the VCTK Clean Speech dataset.