Feature request: Static spatial audio for group voice chat (with example video)

TLDR: Adding a opt-in feature to Discord for spatial audio lets headphone users more clearly hear individual voices in group chats. The specifics and a link to an example implementation of this is below. I am NOT referring to syncing spatial audio to a first person game to make voice chat come from the direction of the player.

With mono-track voice input, its possible client-side to only add a stereo time delay and volume reduction in the left or right ear and achieve a spatial audio effect for headphone users. This is exampled here. I'll go over the specifics of implementing this.

The feature should be off by default.
This avoids confusion for users who don't understand the feature.

The maximum time delay on the left or right ear should be 0.00037 seconds.
This is based off the speed of sound (343 m/s) and the average width of a 10 year old female head (0.127 m) (0.127 / 343 = 0.00037). A time delay larger than what is physically possible given the user's head width may cause discomfort. 0.00037 seconds should be comfortable for all users. This delay is only ~30 samples in the audio track. This delay doesn't seem to impact audio quality for users accidentally leaving the feature on while using a mono audio device.

The maximum volume reduction on the left or right ear should be around -5% (-0.22 db).
The reduction should be small enough for the user to clearly hear everyone if they remove one of their headphones. In the example video I haphazardly chose a max volume difference of -5% (-0.22 db). The exact value may not be important as the video shows that the time delay causes an auditory illusion of a volume difference. A minimal volume reduction value also minimizes impact on users accidentally leaving the feature on while using a mono audio device.

The "I'm online" dot can be used to indicate "positions" of users.
The video shows how the green "I'm online" dot could be added to the fading in-and-out green "I'm talking" circle in the UI to indicate each user's location. This also indicates the feature is turned on.

Algorithm for assigning "positions" in space.
The video shows an example of the first part of this. Client-side (not server-side), users are assigned a "position" in space between 90° to the left or right. A position of 90° equates to 100% time delay and volume reduction, a position of 45° equates to 50% time delay and volume reduction, ect... Considering the 90° left and right positions as static "users" the algorithm would be, 0: When joining an existing call evenly distribute user positions between -90° and 90°. 1: When a user joins place their position to the middle point of the largest gap of any two adjacent user's positions. 2: If the distance to a user's neighbors is unequal, very slowly move towards the furthest neighbor.
This algorithm ensures even spacing of users (given enough time), while not causing sudden jumps in user's positions as people join/disconnect from the call.

This feature does not affect discord server-side.
Audio streams are still transmitted as mono. All calculations are done client-side. Time delays and volume reductions are computationally simple, minimally impacting client-side computers. Client-side computers would need to hold a buffer of 0.00037 seconds of audio (That's like, less than 1KB of RAM right?).

Don't mimic frequency-based sound reduction.
A complete spatial audio effect models how lower frequencies travel through our skulls better and so have less volume reduction. Don't implement this. The effect works without this, it would add significant lag to the audio, and its a research project the Discord Devs don't have time for.

This feature would likely not sound good with music playing bots like "Rythm".
User's likely expect music to sound like its all around them and not coming from a specific position.

This feature has been requested by other users.
Here. And here. Also here.

deepfriedchril

January 17, 2023 22:25

This would be huge, even with small groups as shown in your linked example. Hearing multiple people taking at the same time with flat audio makes it very difficult to understand what each person is saying leading to the awkward, "everybody takes their turn to talk" communication, where as positional audio made it not only possible but easy to listen to multiple people talking at the same time.

I hope that this will be implemented.

Mismatch

January 18, 2023 02:05
Edited

That's something I've been vouching for ever since I've known Discord! It was available in Teamspeak 3, even before I got to know about Mumble. It would be a really good addition, to have automatic "2D spatial distribution" as a toggle option, possibly with "preferred positions" on a per-friend basis - that would be the coolest, because you'd know whoever's talking, and if it's someone unknown or you're not close with, it'd be randomized, but if it's a friend that already has a "preferred position", you'd know that's him (not only by voice, but no mistake when you also get his position - EVEN over other simultaneous speakers).

Also, it would be a plus, if feasible, to provide a positional API for games and third-party plugins. Many FIVEM and REDM servers use Teamspeak (with plugins) or Mumble for that exact purpose.

Thomasburgess2000

December 27, 2023 23:05

This would be huge!! Especially as spatial audio headsets are becoming more common, people will be expecting more things to be natively spatialized.

Comments