I had a thought the other day around voice activity and it's pitfalls: you'll always get that one guy who's slurping on noodles or clacking on his keyboard. It would be really neat to have an option for voice activity that would detect the noise coming into the mic is human generated and speech-like before activating the microphone, and this could be done with machine learning. The downside is that it would likely introduce a bit of input lag because the model would have to read a few milliseconds of the noise before making a determination. This could be remedied by "compressing" the first few milliseconds of speech after detection is made on the receiving client
Please sign in to leave a comment.