
Imagine sitting at a roundtable where five people speak five different languages – and still understanding every word they say. That’s exactly what researchers at the University of Washington have made possible with their new AI system: Spatial Speech Translation.
What’s the Innovation?
Traditional translation tools struggle when too many people talk at once. They usually focus on one voice, and the rest becomes background noise. But this new system breaks that barrier – it can translate multiple voices at the same time, in real-time.
And not just that. It also clones each person’s voice and preserves where the sound is coming from, so when you hear the translation, it’s like each person is still speaking in their unique tone and direction.
“It’s like real-time subtitles, but with sound—and everyone keeps their own voice,” says the research team.
How Does It Work?
The system relies on advanced AI trained to handle complex real-world conversations. It can separate overlapping speech, recognize different speakers, and even clone their unique voices. After translating the content, it plays the translation back using spatial audio, preserving the direction from which each person originally spoke.
All of this runs on high-performance devices like laptops equipped with Apple’s M2 chip or the Apple Vision Pro headset. Importantly, the entire process happens locally on the device, keeping your voice data private and secure without sending anything to the cloud.
How Fast Is It?
When it comes to speed, the system performs impressively. The translations are delivered with only a short delay of about 2 to 4 seconds — fast enough to keep conversations flowing naturally. During testing, users reported that even with background noise or multiple people speaking at once, the system managed to maintain clarity and coherence.
This slight delay was not seen as disruptive, especially considering the complexity of translating and cloning several voices in real time. Compared to existing single-speaker translation tools, this is a significant step forward in both speed and performance.
Why It Matters
This could completely change the way we:
- Travel and interact abroad
- Join multilingual meetings
- Work in international teams
- Learn in diverse classrooms
The new system can reduce language barriers to an almost imperceptible level.
What’s Next?
While it’s still a prototype, the researchers believe it could be commercialized soon. It’s a big step forward in real-time communication — not just translating, but making it feel natural and human.
Prepared by Navruzakhon Burieva
Leave a Reply