AssemblyAI could be improved because when we have different accents on the same call, it usually fails, especially when we have American, Asian, and Latin American speakers on the same call, making the transcriptions a bit noisy. The transcription quality of non-native English speakers should be improved. I choose nine out of ten because it's really good and fast, working well when there is an English speaker on the call, so the quality of the transcription is really good. Latency is almost zero, and it's 20 to 40% faster than the industry benchmarks. I only rate it as nine because it lacks accent detection and the quality for different accents.
A few drawbacks I observed in the speaker identification are that in some videos where text and names appear on the video frames, AssemblyAI does not identify the actual speaker name, instead providing generic names such as Speaker A, Speaker B, Speaker C, or Speaker X, Y, Z. AssemblyAI does not identify the real speaker in some audio or video files, just sending Speaker A, Speaker B, or Speaker C. They are not easily identifying speakers in some instances. AssemblyAI does not provide a cloud service; I simply upload the audio file to the API, and they store it somewhere internally to send me the transcription text. For additional functions, the API does not provide video uploading functionality, and I need to convert video to audio first before uploading it to AssemblyAI.
AssemblyAI offers advanced speech recognition technology tailored for developers. Its robust API facilitates easy integration into existing systems, making it a versatile option for many applications.AssemblyAI proficiency in speech-to-text conversion is highly regarded. By leveraging state-of-the-art machine learning models, it provides reliable transcription and voice processing capabilities. Its adaptable API design supports integration across desktop, mobile, and web platforms. This...
AssemblyAI could be improved because when we have different accents on the same call, it usually fails, especially when we have American, Asian, and Latin American speakers on the same call, making the transcriptions a bit noisy. The transcription quality of non-native English speakers should be improved. I choose nine out of ten because it's really good and fast, working well when there is an English speaker on the call, so the quality of the transcription is really good. Latency is almost zero, and it's 20 to 40% faster than the industry benchmarks. I only rate it as nine because it lacks accent detection and the quality for different accents.
A few drawbacks I observed in the speaker identification are that in some videos where text and names appear on the video frames, AssemblyAI does not identify the actual speaker name, instead providing generic names such as Speaker A, Speaker B, Speaker C, or Speaker X, Y, Z. AssemblyAI does not identify the real speaker in some audio or video files, just sending Speaker A, Speaker B, or Speaker C. They are not easily identifying speakers in some instances. AssemblyAI does not provide a cloud service; I simply upload the audio file to the API, and they store it somewhere internally to send me the transcription text. For additional functions, the API does not provide video uploading functionality, and I need to convert video to audio first before uploading it to AssemblyAI.