Google Cloud Speech-to-Text is not entirely accurate, so we have to correct for those errors in our AI software. It uses neural networks, and that stochastic processing is 70% to 75% accurate. It gets it wrong too often, and since I personally work with this, I don't appreciate that. However, they seem to be the best option currently. We have to write our own improvements because their tools to improve transcription accuracy in our domain aren't very powerful. The timestamp technology for recognized words is inadequate, so we don't use it. We understand words based on their meaning, and we have a whole AI engine that does that, which is one of our differentiators from a product standpoint. We didn't use the custom voice creation feature; we just use their voices, which are fine for our purposes.
