Amazon Kinesis Valuable Features
Senior Software Engineer at a tech services company with 501-1,000 employees
The feature that I've found most valuable is the replay. That is one of the most valuable in our business. We are business-to-business so replay was an important feature - being able to replay for 24 hours. That's an important feature.
In our use case Kinesis was able to handle the rate at which we were pumping in data and it could publish the data to whatever destination, be it Lambda or any other consumer.
We were seeing that there was a delay in the amount of processing time of the Lambda and the subsequent storing into DynamoDB. There was a delay in that process. So, at the rate at which we were pumping in the data, it was obvious we had ensured that this should work. This rate at which we were pumping it is the rate at which the data is published and processed, as well. But we saw that it was not working. Not the Kinesis data nor the subsequent parts of our application, they tended to not be up to the mark with Kinesis. So the business asked us for the ability to be able to get back to a certain point in time and replay the entire thing. That way there is a record if there is an error when it is being processed.
The ordering is another big thing for us. Kinesis is known for maintaining the order in which the data is ingested. We can tweak that and can configure Kinesis to ensure that the ordering is maintained. The order in which the data is actually being published is also important for us. That is why the business was ok even if a thousand record failed to process, because they were okay to start from 500 again, and again reach a thousand. They wanted to ensure that there was no scope for failure there. That is why the replay feature was useful for us. That is why both performance and replay are important. When I say performance, I mean the reliability. Kinesis has an inbuilt replay mechanism that also came in handy for us.
These were the crucial things that we were looking at, and it worked quite well.
I think that all Kinesis components have their own features and their own value. Starting from Data Streams, you have to have it as the data queue or else you would need to go to Kafka or another message broker (with higher implementation effort if your ecosystem is fully hosted in AWS already). I think that the solution they have put together in Kinesis is fairly easy to use. It is definitely a core component in any data architecture.
On the other hand, I find Firehose super simple and super useful for certain use cases. I wouldn't say it is as essential as Data Streams, but it is very handy if you want to just dump data. The connection between Data Streams and Firehose allows you to do that without worrying too much about performance and configuration. I find Firehose super simple to use for a very specific use case, but that use case is very common.
Kinesis Analytics is definitely more cutting edge. Out of Kinesis this is the most innovative part. We have used it for some alarms and for some batch processing in time windows. If we are talking about massive amounts of data, then you need to move to other solutions such as EMR or Glue for big data. If the amount of data is manageable and you want something to analyze on the fly, Kinesis Analytics is very appropriate and it gives you the ability to interact via SQL. So it makes your life easier if you want to develop a relatively self-contained application to do analytics on the fly.
I would say that Data Streams, in a matter of weeks, created a massive time-saving. Something that we haven't factored in is cost savings because we don't need to repeat the same data flow multiple times since each of those data flows are actually cost associated. We're talking about a couple of $100's per month, which is significant. In terms of time-savings here, we are in the scale of weeks.
Senior Software Engineer at a computer software company with 201-500 employees
The first would be the one found in the AWS SDK using the asynchronous client: put Record batch function which allows you to put a list of records in one put record request, which saves time and it's more efficient. Also, by using the asynchronous client, the records are sent in the background using an internal thread pool that can be configurable for your needs. In our performance testing, we came across this setting was the fastest solution. It didn't impact anything in the performance of the system process.
The second one would be the ability to link the stream to other places other than S3 via configuration of the stream and without changing a line of code.
Lastly, you can also link a lambda function to the stream to transform the data as it arrives in before writing it in S3, which is great to perform some aggregations or enrich the data with other data sources.
The most valuable feature is that it has a pretty robust way of capturing things. You can capture things from the beginning, or start capturing tweets at a certain point in time.
It has some good fault tolerance in case something breaks.
It's really easy to implement, get started, and use.
With AWS, you don't have to invest in any kind of infrastructure. All you have to do is go to the portal, create an account, turn it on, and use a few lines of Python code in order to capture what you're looking for.
The Kinesis API is really easy to put information on the shards. You just need to enter a few lines of code.View full review »
I've used Kafka in the past and Kinesis is a lot simpler. It's all hosted, it's nice it's really good. There aren't too many knobs and things to turn and ways to screw up. It's a pretty simple product and a lot easier to manage because it's hosted by AWS and it accomplishes what we need it to. The other nice thing is that we can make it available to external customers if they want to get a Kinesis feed of our things.View full review »
The features that I have found most valuable depend on the use case. I find data Firehose and data streams are much more intelligent than other streaming solutions.
There is a time provision as well as data size. Let's suppose you want to store data within 60 seconds, you can. Let's suppose you want to store data up to a certain size, you can, too. And then you can it write back to the S3. That's the beauty of the solution.View full review »
When it comes to Kinesis Firehose, the most valuable feature is the auto-scaling. It does auto-sharing, auto-correction, things like that and responds dynamically. Secondly, it innately has all the features of our reliable data pipeline, allowing you to store raw documents and transform data on the fly. When data comes into the stream through Firehose, we can see it and analyze every single object, keep the raw objects, carry out some transformations on it in flight, and then put it at rest. It allows us to do some real time analytics using Kinesis Analytics. We do anomaly detection in flight as well. We receive any changes with regards to user patterns and behaviors, in real time because Kinesis allows that.
Big Data Architect
The solution is very flexible and allows for a lot of configuration. It just offers up a lot of possibilities.
I'm using Amazon S3 and Redshift using Amazon server. I can make large configurations and update in near real-time, so that we have real-time use for batch intervals.
The solution is great for scanning in order to handle environmental data.
The data stream feature on offer is excellent. We use it quite extensively.
The solution works well in rather sizable environments. We deal with a lot of data and it handles it very well.
The solution has a very good alerts system to allow us to respond in real-time.
The dashboards are excellent.
The solution offers very good data capture and integrates well with Power BI and Tableau, for example.
The product makes it very easy to create jobs.View full review »
At the moment, I am not using Amazon Kinesis, but Azure Event Hub, which I have found to be more meaningful and easier to use.
I like the event bubbling feature of Amazon Kinesis, although I ultimately switched to Azure Event Hub. Both solutions have similar features, but the latter offers us certain operational advantages.View full review »
Kinesis is in real-time. It enables you to process stream data in real-time. You can drive it in seconds or minutes instead of hours or days.
Kinesis is a fully managed program streaming application. You can manage any infrastructure. It is also scalable. Kinesis can handle any amount of data streaming and process data from hundreds to thousands of processes in every source with very low latency.