Separate a target speaker's speech from a mixture of two speakers

For project and code or API request: https://www.catalyzex.com/paper/arxiv:2005.07074

(FaceFilter: Audio-visual speech separation using still images)

Done using a deep audio-visual speech separation network. Unlike previous works that used lip movement on video clips or pre-enrolled speaker information as an auxiliary conditional feature, we use a single face image of the target speaker

Posted by uniqueone
,