Example Estimations of the Attention-Based Separation Model [Publication-intent]
Libri2MixInput Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Input Audio - Mixed
Estimated Audio - Speaker 1
Estimated Audio - Speaker 2
Abstract
We present a highly efficient speech separation algorithm based on a novel compact attention mechanism within the Transformer architecture. By significantly reducing computational complexity and size of the model compared to existing baseline models, our approach achieves state-of-the-art efficiency on the Libri2Mix and LRS2-2Mix datasets.