Machine Learning Voice Activity

I had a thought the other day around voice activity and it's pitfalls: you'll always get that one guy who's slurping on noodles or clacking on his keyboard. It would be really neat to have an option for voice activity that would detect the noise coming into the mic is human generated and speech-like before activating the microphone, and this could be done with machine learning. The downside is that it would likely introduce a bit of input lag because the model would have to read a few milliseconds of the noise before making a determination. This could be remedied by "compressing" the first few milliseconds of speech after detection is made on the receiving client

||Jojo||

February 22, 2019 09:49

While the idea sound really nice, I don't think thats something up to Discord. Because Discord is made for Gamers and delays/lags are bad for gamers in certain situations. To get a real voice activation instead of noice activation it may require an own SoC CPU for the mic. (Not sure if enough ppl would spent money on something like that)

Comments