Below, I give a simple intro to ChatGPT and its challenges, and answer the question, can ChatGPT transcribe audio?
ChatGPT: An Overview
ChatGPT is one of the most popular AI models that is used to automatically generate content, solve problems, and do a variety of tasks via a question/answer model. OpenAI is the company behind ChatGPT and they have trained the model to interact with humans by asking it questions.
For example, a developer might have an issue with some programming code. They could paste the code into ChatGPT and ask a question like “Why is this code not working as expected?”. The AI model would then analyze the question and code provided and respond with an answer. This could be a solution, or it could ask additional questions if the developer didn’t provide enough context.
This type of conversational process is incredibly useful as it creates a realistic back and forth and allows the input to get exactly what they want providing they can give the right info.
ChatGPT’s Transcription Abilities
So, can ChatGPT transcribe audio? Yes! ChatGTP has a dedicated transcription function which OpenAI also developed called Whisper API . The process is relatively simple:
- Open ChatGPT.
- Upload your audio file.
- ChatGPT will then run it through the Whisper API speech recognition algorithm.
- This processes the speech and spits out a text output.
- You can save the text output in a variety of file formats.
Audio file formats supported currently include MP3, MP4, MPEG, M4A, WAV, WEBM, and MPGA and it supports a range of output formats too.
In terms of language support, ChatGPT currently supports around 50 languages including Hindi, Greek, Arabic, Polish, Urdu, and Swahili for example.
Accuracy and Performance
ChatGPT can convert audio to text and it is relatively accurate but the speech recognition can falter depending on the audio quality, but this holds for any transcription service.
The processing time is relatively quick too and it’s certainly on part with other transcription services in terms of the time it takes to analyze audio files and generate the text output
Drawbacks vs Other Transcription Services
The main drawback compared to other transcription services such as Transkriptor is the learning curve. ChatGPT is a specialist AI model and it has a much steeper learning curve compared to something incredibly easy to use like Transkriptor, see Transkriptor vs Microsoft Copilot .
Ideally, you have to have an understanding of how the AI model works and its capabilities, but also the question and answer format. This means it is better suited for professionals and those with some prior knowledge of AI models or those who have used ChatGPT before.
To improve the quality of the audio transcription you have to ask questions to the Whisper API model which also takes additional learning. Once you get used to how it works and the types of questions to ask, it becomes intuitive, but if you want a quick, quality transcription, ChatGPT isn’t currently the best option available.
Compared to traditional online audio-to-text transcription services, ChatGPT is limited in terms of languages, speech recognition complexity, and input/output files, which makes dedicated transcription services a more reliable choice, especially when considering the added benefits of transcription services for SEO , enhancing your content's searchability and online presence. Currently, it simply can’t compare on a like-for-like basis with dedicated transcription services and it has less to offer.
Lastly, a major drawback is the maximum audio file size limit which is 25MB. Longer transcriptions of things like interviews and meetings can easily exceed this in terms of file size so you are limited in which types of audio you can transcribe. You could use an audio compression service to reduce the file size of longer meetings for example, but this could reduce the audio quality and result in a poorer-quality transcription.
ChatGPT Can Transcribe Audio But With Limitations
To answer the original question, can ChatGPT transcribe audio? Yes it can, but it is by no means a polished service, and in its current iteration there are a range of drawbacks. The steeper learning curve and the need to understand the Q&A model of Whisper API means obtaining a quality audio-to-text transcription can be a slower process.
Additionally, the AI model is still being developed so compared to traditional transcription services, it can’t compare in terms of features, accuracy, and language support. The 25MB audio file size limit is something to consider too and can be limiting if you have larger audio files to transcribe.
This could all change in the future and over time ChatGPT could become one of the leading audio-to-text transcription services. However, as it stands, using a dedicated transcription service that has a proven track record is the better option.