is it difficult to clean up audio and just leave the voice? (remove music and background game sounds)

can ai do this or some other tool?
this emi vod would make a great moan compilation