The major challenge with Google Cloud Speech-to-Text is that not every call is clear. Our representative may be in a silent environment, but the client can be anywhere. We need to manage background noise for all calls, so handling audio file clarity is a challenge. Sometimes, speaker diarization is affected, leading to incorrect speaker identification. For instance, if someone is speaking, it might attribute that incorrectly to another person. These challenges require an AI to analyze the complete output given by Google Cloud Speech-to-Text and correct it properly. The output might have incorrect grammar, especially since many of our clients speak Spanish. Sometimes, the transcription will be in English when it should be in Spanish, and the AI can help correct that as long as we provide the right context. A crucial update would be to autocorrect the output from Google Cloud Speech-to-Text because irrelevant words occasionally appear. In our integration, we handle the output with an AI to correct grammar mistakes and issues. If that feature were integrated into Google Cloud Speech-to-Text, it would remove the need for a third-party process. We could simply call Google Cloud Speech-to-Text without having to format or correct the output separately.
The tool's telephony model does not produce accurate results. With the telephony model, whenever a phone call occurs, we want to transcribe it live or something, and this is not possible in Module Google since it does not support Cloud Translation V2 API. If we are having, like, pre-recorded calls, it is giving you the transcript, but in some cases, where if the call is on hold for five to ten minutes or fifteen minutes or something, whatever we spoke about is not getting recorded, which is an issue.
Google Cloud Speech-to-Text's price could be improved. Google Cloud Speech-to-Text's trial experience could be improved by adding some extra minutes in the trial version.
Director of Research and Regulatory Affairs at SafetySpect Inc
Real User
2023-05-30T16:55:00Z
May 30, 2023
The one thing that I find is when I often use specialized terms, and the solution doesn't know them. I'm not sure if there's an easy way out of this, so I usually have to delete the text to correct something. That would be good if I could just tell it to change the spelling or something like that, and it'd be smart enough to figure that out.
The multilanguage support for the chatbot needs to be better. If you incorporate the translation natively in a chatbot, maybe you can do a chatbot in English and automatically have the same chatbot, however, the same chatbot needs to be in all other languages as the translation services of Google is good. That translation is not bad for technical people. If they can get a multilanguage chatbot, it would be ideal.
Google Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google’s machine learning technology.
The major challenge with Google Cloud Speech-to-Text is that not every call is clear. Our representative may be in a silent environment, but the client can be anywhere. We need to manage background noise for all calls, so handling audio file clarity is a challenge. Sometimes, speaker diarization is affected, leading to incorrect speaker identification. For instance, if someone is speaking, it might attribute that incorrectly to another person. These challenges require an AI to analyze the complete output given by Google Cloud Speech-to-Text and correct it properly. The output might have incorrect grammar, especially since many of our clients speak Spanish. Sometimes, the transcription will be in English when it should be in Spanish, and the AI can help correct that as long as we provide the right context. A crucial update would be to autocorrect the output from Google Cloud Speech-to-Text because irrelevant words occasionally appear. In our integration, we handle the output with an AI to correct grammar mistakes and issues. If that feature were integrated into Google Cloud Speech-to-Text, it would remove the need for a third-party process. We could simply call Google Cloud Speech-to-Text without having to format or correct the output separately.
The tool's telephony model does not produce accurate results. With the telephony model, whenever a phone call occurs, we want to transcribe it live or something, and this is not possible in Module Google since it does not support Cloud Translation V2 API. If we are having, like, pre-recorded calls, it is giving you the transcript, but in some cases, where if the call is on hold for five to ten minutes or fifteen minutes or something, whatever we spoke about is not getting recorded, which is an issue.
Google Cloud Speech-to-Text's price could be improved. Google Cloud Speech-to-Text's trial experience could be improved by adding some extra minutes in the trial version.
The one thing that I find is when I often use specialized terms, and the solution doesn't know them. I'm not sure if there's an easy way out of this, so I usually have to delete the text to correct something. That would be good if I could just tell it to change the spelling or something like that, and it'd be smart enough to figure that out.
The multilanguage support for the chatbot needs to be better. If you incorporate the translation natively in a chatbot, maybe you can do a chatbot in English and automatically have the same chatbot, however, the same chatbot needs to be in all other languages as the translation services of Google is good. That translation is not bad for technical people. If they can get a multilanguage chatbot, it would be ideal.