I wanted to use the solution in my company for topics, but in the beginning, we did some EOC to use the data analytics part, but then we didn't go with that. We are currently using Amazon Kinesis Data Streams.
The best thing about the tool is that I use it as a normal service. Similar to Kafka topic, we use Amazon Kinesis in our company. It is a managed service that can be shared automatically based on consumption and consumer needs, so we don't have to worry. Sometimes, we also get the throughput error, and there are data contaminants. Multiple jobs try to help the queue.
There are some kind of hard limits on Amazon Kinesis, and if you hit that, then you will get the throughput exceed error. We have to deal with and reduce how many consumers are hitting Amazon Kinesis Data Streams. We combined multiple jobs into a single job, so that there are no multiple jobs in the mainstream.
I have been using Amazon Kinesis for two years. My company has a partnership with the tool.
There is no problem with the tool's stability. I think with the tool, you cannot send or store more than one MB of data in one go. You have to split the two MB of data that you want to store. You have to split it into two messages, and then you can store it. Later on, on the consumer side, you have to deal with it and see how you can merge the data. For most of the devices, the data which we get is not more than one MB. In some use cases, we find the need to do something from our end.
Scalability-wise, I rate the solution a five out of ten since we faced some errors in the tool.
In terms of integrating Amazon Kinesis with other tools, I would say that we don't directly do anything. We have our jobs that consume the data. You have multiple ways to read Amazon Kinesis. If you have a traditional job, then you can use some SDK-based libraries to store data or send data. If you want to consume data, there are ways that you can directly do it, such as using JAR files and reading the data from them. If you have Redshift or Amazon Kinesis Data Firehose, you can directly send data to S3, and then later on, you can consume it. There are multiple options to consume data, but we are currently using a via script, and we are not directly sending it to S3 or some other database.
I recommend the tool to others.
If you have to deal with more than one MB, then you have to deal with your process of how you want to send data to places. I just put the data, and then on the job, you have to combine that, which would be tricky. Another way is to store the metadata and then store the actual data somewhere, like in SQL or somewhere else, and later on, you can put the locations. You need to make sure not to have multiple consumers creating the same data stream. Otherwise, you can get the throughput errors.
I rate the tool a nine out of ten.