Hey all!
I’m looking to build a small app where you can interact with an avatar in XR. First step towards that is speech to text. What’s the best way to allow users to speak and convert what they say into text?
Thanks a ton!
Ahmed
Hey all!
I’m looking to build a small app where you can interact with an avatar in XR. First step towards that is speech to text. What’s the best way to allow users to speak and convert what they say into text?
Thanks a ton!
Ahmed
Hey Ahmed, I’ve done this before in an 8th Wall project integrating the Speech to text by OpenAI via API.
It’s quite straightforward, you use the Media Recorder API to request permission to record audio and execute the recording.
Preferably include a UI that makes it clear for the user to know when to start recording and when to stop. KISS.
Once the recording is done, you can listen for ondataavailable to capture the data that was recorded and convert to a format that it’s accepted by the Speech to Text API (like mp3). Post with the mp3 to the API and it will return a transcript.
I can vouch for how good it is because it was able to spell my name perfectly out of a sentence in English.
Alternatively, you could use 8th Wall’s Media Recorder to do this and hack your way into extracting audio from the videoBlob that it returns, not sure if you can record audio only with it.
Let me know how it works!
Awesome! Thank you Florencia
The accuracy is hard to beat. Just a heads up for anyone implementing this- make sure to handle the audio permissions carefully on iOS, as it can sometimes be finicky with the microphone if the user hasn’t interacted with the screen first. Keeping the UI ‘KISS’ like you mentioned is definitely key!
For speech-to-text in an XR app, I’d recommend starting with OpenAI’s Whisper model—it’s accurate, supports multiple languages, and can run locally/offline on decent hardware to avoid latency issues in immersive environments.