Separate a target speaker's speech from a mixture of two speakers For project
Deep Learning/Papers2read 2020. 5. 18. 10:01Separate a target speaker's speech from a mixture of two speakers
For project and code or API request: https://www.catalyzex.com/paper/arxiv:2005.07074
(FaceFilter: Audio-visual speech separation using still images)
Done using a deep audio-visual speech separation network. Unlike previous works that used lip movement on video clips or pre-enrolled speaker information as an auxiliary conditional feature, we use a single face image of the target speaker