IEEE SP Cup 2026 | Team SuperZooooom | Two-Stage Speech Enhancement Framework for Audio Zooming
Автор: yang
Загружено: 2026-02-15
Просмотров: 36
Описание:
A Dual-Microphone Two-Stage Speech Enhancement Framework for Audio Zooming
Team SuperZooooom for IEEE Signal Processing CUP 2026
Students: Zhixiang Tang, Yanxin Tian, Gengyou Liu, Yongyi Deng
Tutor: Yujie Zhu
Supervisor: Gongping Huang
Audio zooming aims to enhance interested sound sources aligned
with visual focus while suppressing interference sources, which is
particularly challenging for dual-microphone smartphones due to
limited spatial resolution and constrained computational resources.
Existing methods either suffer from degraded spatial selectivity in
adverse acoustic conditions or rely on computationally intensive
models that are unsuitable for on-device deployment. This paper
proposes a two-stage cascaded audio–visual zooming framework for
dual-microphone smartphones, achieving consistent improvements
under both anechoic and reverberant conditions. In the first stage,
a directionally guided enhancement network exploits directional
priors by comparing observed inter-microphone phase differences
(IPDs) with theoretical IPDs, thereby improving spatial separation.
In the second stage, a single-channel enhancement model is used
to jointly refine amplitude and phase spectra and suppress residual
noise, where knowledge distillation is applied to reduce model com
plexity. Experiments conducted following the IEEE Signal Process
ing Cup 2026 protocol show significant improvements in OSINR,
SI-SNR, STOI, PESQ and ViSQOL, proving the effectiveness and
competitiveness of the proposed framework.
School of Electronic Information
Wuhan University
Повторяем попытку...
Доступные форматы для скачивания:
Скачать видео
-
Информация по загрузке: