IT Central Station is now PeerSpot: Here's why

Amazon Kinesis OverviewUNIXBusinessApplication

Amazon Kinesis is #2 ranked solution in Streaming Analytics tools. PeerSpot users give Amazon Kinesis an average rating of 8 out of 10. Amazon Kinesis is most commonly compared to Apache Flink: Amazon Kinesis vs Apache Flink. The top industry researching this solution are professionals from a computer software company, accounting for 22% of all views.
What is Amazon Kinesis?

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. Amazon Kinesis enables you to process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.

Amazon Kinesis was previously known as Amazon AWS Kinesis, AWS Kinesis, Kinesis.

Amazon Kinesis Buyer's Guide

Download the Amazon Kinesis Buyer's Guide including reviews and more. Updated: May 2022

Amazon Kinesis Customers

Zillow, Netflix, Sonos

Amazon Kinesis Video

Amazon Kinesis Pricing Advice

What users are saying about Amazon Kinesis pricing:
"Under $1,000 per month."

Amazon Kinesis Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
Senior Software Engineer at a tech services company with 501-1,000 employees
Real User
Top 10
Easily replay your streaming data with this reliable solution
Pros and Cons
  • "The feature that I've found most valuable is the replay. That is one of the most valuable in our business. We are business-to-business so replay was an important feature - being able to replay for 24 hours. That's an important feature."
  • "In general, the pain point for us was that once the data gets into Kinesis there is no way for us to understand what's happening because Kinesis divides everything into shards. So if we wanted to understand what's happening with a particular shard, whether it is published or not, we could not. Even with the logs, if we want to have some kind of logging it is in the shard."

What is our primary use case?

In the simpler use case, we were just pumping in some data. We wanted a product, an AWS service, that would accept data in bursts. We were pushing in, for example, 500 records every 300 milliseconds. What I'm trying to say is per second we were trying to pump in around 1,500 records into some streaming services what we were looking at. That type of streaming information would then go into another source, for example Lambda. Then Lambda would consume the data and ultimately we would process and store it in DynamoDB.

This was the basic flow that we had. We were looking for a service. And at that point in time in our organization, the architects were asking us to leverage Kinesis to see how it performed. They wanted to see how it performs, so they were encouraging us to use it. Although we were looking at something as simple as SQS and SNS, they were encouraging us to use Kinesis and that is what we did. 

There were a few considerations when we moved Kinesis. What is the reliability? When I say reliability, I mean resilience, or the failure mechanism we thought was required for that use case because we did not want to lose data. Also, we wanted to have the ability to replay from a certain point because we were pumping in reports from a data source and we were always keeping track of the point at which we had stopped. So if we wanted to replay something from the prior data which was already processed by Kinesis, and it failed in the Lambda, we wanted to have the ability to retry and replay the previously processed stream.

That prompted us to use Kinesis because it has the really good feature of being able to replay for 24 hours whatever you've already processed and this allows us to replay it. That was one key feature that we thought we would need. In fact, performance-wise, it performed really well. We also understood that it is actually meant for streaming, video streaming and stuff like that. Even data streaming. It does a good job with it. But mostly, we saw that it is a more suitable service for video streaming simply because when we actually pump data into Kinesis, we don't know how to test it other than waiting for the data to come out of it from the other end and hook into Lambda and extract data out of it and process it.

That's the only way we can test it. That was a drawback but it did not matter too much. But it did matter in the next project, and for the bigger use cases where we used Kinesis. But this project was a simple use case and it served really well, so we kept it as-is. We moved on to the next project, which was bigger. It was an event-driven architecture that we were trying out on one of the features. When we went event-driven, at that time a few of the new features and new services from Amazon which are available right now, were not available.

We thought of using Kinesis again to stream the data from one microservice to another in a proper microservice architecture. We were using this as a communication medium between microservices. This is where the testing was a little complicated for us. Ultimately, what we realized out of the entire exercise was that Kinesis may not have been the right choice of service for us for our use case. But what we discovered were the benefits of using Kinesis and also the limitations in certain use cases.

The biggest lesson learned for us was even before you take up anything like Kinesis, which is a big AWS service, there has to be a POC, proof of concept, done. To see whether it really suits that use case or not. That is what we ultimately realized. Before that, there were a few other reasons why we chose Kinesis over DynamoDB streaming. Ultimately it was from one microservice to another, and each microservice had its own DynamoDB data store.

We were thinking of using the DynamoDB Stream and Kinesis to keep things simple. But it turned out that DynamoDB Streams have a limitation that whatever a stream comes out of DynamoDB it can be consumed only by a single client. But with Kinesis it doesn't matter. Any number of data sources can come in and whatever Kinesis publishes can be consumed by any number of clients. That is why we went with Kinesis in order to see how it performed. Because even performance-wise, we found that we need a crazy load server because we are part of the wagering industry, which needs peak performance. Online betting. In Australia, it's a regulated market and one of the most happening businesses. Here, performance is really important, because there are quite a few competitors, around 10 to 15 prominent competitors and if we have to stand out, our performance has to be beyond the customer's expectation.

So, with that in mind, they knew our performance had to scale up. That is where we found the advantage of using Kinesis. It's been reliable. It has not failed to publish. It actually did fail, but the failure was simply because of pumping in too much data than what Kinesis can take in.

There is a limit that we discovered. I don't remember the numbers there. But we did manage to break Kinesis by pumping in too much data.

How has it helped my organization?

The major advantage with Amazon Kinesis is the availability. Additionally, the reliability is awesome when it comes to Kinesis. Kinesis also offers the replay.

It is incredibly fast. The ingesting of data, the buffering, and processing the data are really fast. With AWS you always get the the dashboard for monitoring. That is a really good aspect for us to see how Kinesis is performing. Otherwise there is no other way for us to know what's happening within Kinesis other than the Lambda kicking in and processing. So the Lambda logs were indirectly necessary for us to look into Kinesis.

The dashboarding AWS provides out of the box for monitoring the performance of benefits is quite nice. Also, it is a self-managed service so we don't need to worry about what happens behind Kinesis. That was another big win for us. We did not have to worry about how to maintain or manage Kinesis in general. That was a consideration. It is kind of server-less. 

The scalability was quite acceptable. It can handle a large amount of data as well. It can take in a large amount of data, but there is a limit. It can take a huge amount of data and process it from many sources. We can have any number of data sources coming in, and it can ingest all of them and publish it to wherever you want.

You can design your code in such a way that the Lambda that actually processes whatever is published by Kinesis can kind of segregate the data coming in from multiple data sources, based on the logic that is implemented there. That is a nice feature. Ingesting data from multiple sources, and being able to publish it to multiple destinations.

What is most valuable?

The feature that I've found most valuable is the replay. That is one of the most valuable in our business. We are business-to-business so replay was an important feature - being able to replay for 24 hours. That's an important feature.

In our use case Kinesis was able to handle the rate at which we were pumping in data and it could publish the data to whatever destination, be it Lambda or any other consumer.

We were seeing that there was a delay in the amount of processing time of the Lambda and the subsequent storing into DynamoDB. There was a delay in that process. So, at the rate at which we were pumping in the data, it was obvious we had ensured that this should work. This rate at which we were pumping it is the rate at which the data is published and processed, as well. But we saw that it was not working. Not the Kinesis data nor the subsequent parts of our application, they tended to not be up to the mark with Kinesis. So the business asked us for the ability to be able to get back to a certain point in time and replay the entire thing. That way there is a record if there is an error when it is being processed.

The ordering is another big thing for us. Kinesis is known for maintaining the order in which the data is ingested. We can tweak that and can configure Kinesis to ensure that the ordering is maintained. The order in which the data is actually being published is also important for us. That is why the business was ok even if a thousand record failed to process, because they were okay to start from 500 again, and again reach a thousand. They wanted to ensure that there was no scope for failure there. That is why the replay feature was useful for us. That is why both performance and replay are important. When I say performance, I mean the reliability. Kinesis has an inbuilt replay mechanism that also came in handy for us.

These were the crucial things that we were looking at, and it worked quite well.

What needs improvement?

In general, the pain point for us was that once the data gets into Kinesis there is no way for us to understand what's happening because Kinesis divides everything into shards. So if we wanted to understand what's happening with a particular shard, whether it is published or not, we could not. Even with the logs, if we want to have some kind of logging it is in the shard. That is something that we thought we needed then, but later we realized that Kinesis was not built for that. They must have already improved by now, because I have not been in touch with AWS for the last five, six months since I joined this organization which uses Azure. I did not get to experiment with AWS Kinesis too much after that.

It was built for something else, but we used Kinesis for one purpose and we were expecting a feature out of it that may not have really been the design of the service when they built Kinesis. It was almost like a black box for us, because once the data comes in we need to rely on the Lambda itself to let us know. Because if some Kinesis code is coming in, it processes that we will log back in using the Lambda. And that is where we would know, "Oh, okay this guy has come in, this guy has come in." We hoped for a better way of being able to track the shard being processed or how they streamed within Kinesis.

We wanted to have a look at that, but that was not available then. It may not even be available now. We did not have the feature that we expected in the first place from Kinesis. Overall that was the only thing that we felt was lacking. Our use case may not have been the most ideal one, but other than that we did not have many qualms with Kinesis. Overall, we felt we would have simplified the entire design of what we did by simply using an SNS and SQS, because we have much better visibility in terms of tracking what happens within the SNS and SQS.

For how long have I used the solution?

I have used Amazon Kinesis for a couple of projects starting from August 2019 until July 2020. I used Amazon Kinesis in exactly two projects in fact, one after the other.

What do I think about the scalability of the solution?

In terms of scalability, there is a limit which is documented by Amazon. But when we started using it, we didn't know that. We did not evaluate its complete documentation. Of course we went through the aspects that we wanted to understand and we made the choice. But it did break at a certain point.

It was okay for us simply because we could do with a lower pumping rate. So, it did not cause too much of a hazard for the business as such, but we did manage to break Kinesis.

Overall, what we realized was for event driven architecture for simple use cases where you need reliable streaming, Amazon Kinesis works really well. But, for event driven it may not be the best choice.

That's what we figured out at the end of our project. The project was successful. It served its purpose. But the amount of support that we had to provide to see that the entire infrastructure holds up to the load was high.

We felt that we could have done with an easier adaptation of the same architecture. We could have gone with an easier implementation, by probably choosing SNS and SQS over Kinesis in our use case. So, lessons learned.

This is all that we worked on with Kinesis. This is what we figured out after close to a year of working with it. One project was no problem at all. Whatever the purpose, Kinesis did more than expected. And, in the other one we kind of hit the boiling point of Kinesis and realized that it may not be the right choice in that scenario. But it was still okay. We still left it there, and it served its purpose.

How are customer service and technical support?

We had an Amazon technical advisor who was visiting us once every week on the same day. He would be with us and he would just be there and we could reach out to him and ask him for suggestions as to what we could use and what we should do. He would help us with whatever queries we would give him. Even if he did not know he went back to the Amazon experts and then he would get us the answers. But, in this case for Kinesis, it was more driven by the architecture teams here, for us to try it out and see how it performs.

We did go to the Amazon technical support guy who was available for us to understand the limitations and the use cases. He did help us, but we were deep into our implementation when we went to him so we could not change or accommodate because we were almost at the end of the implementation. But, yes his inputs were definitely valuable for us to understand Kinesis better.

How was the initial setup?

In terms of initial setup, Kinesis is available for us to use. All we need to do is see what stack we are using. For example, our stack consists of a Lambda, Kinesis stream, DynamoDB, and some data source that is probably another Lambda or something. So Lambda feeds data into Kinesis and Kinesis publishes it into another Lambda. I'm just giving an example. All these four components come under a certain stack so there's not much to set up other than ensuring that it's part of a used CloudFormation for ensuring that we maintain stacks separately. Kinesis had to be part of the stack and data CloudFormation stack template and also it needs permissions from the data source of both source and destination. We just need to give permission to those data sources to be able to access Kinesis. Other than that, there's nothing much to set up because Kinesis is a self managed service.

What about the implementation team?

We were four developers and one principal developer who were taking us from the architecture standpoint during setup.

What's my experience with pricing, setup cost, and licensing?

I think there is a paid version only, there is no free version. I think it is possibly on the expensive side.

I did not go too deep into pricing, because our business did not care about pricing that much. They just wanted the product to be solid and level at all times. The business is generally conservative about services and pricing. But, this was a different case for us where the price did not matter.

I did not explore that much into the pricing of Kinesis, per se.

Which other solutions did I evaluate?

I'm aware of Costco streaming, but I have not used it in any project. This was the only streaming service that I used.

Here, we mostly use Azure Web Apps, Azure Web Jobs and the function apps, which are similar to Lambda. The exposure that I'm seeing is not as extensive here. It is not as extensive as it was for me in my previous organization. In the previous organization the entire infrastructure was on cloud, but here in my current organization it's partially on cloud. So the exposure into many Azure services is limited at this point.

What other advice do I have?

With my limited exposure to Kinesis, and with the pain points and probably not using it properly, we did see that it was successful. Having said all that, and the pain points that we went through, on a scale of one to ten I would give Kinesis an eight out of 10.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Pablo Giner - PeerSpot reviewer
Head of BI at Wind Mobility
Real User
Top 10
The ability to have one single flow of inputting data from multiple consumers simplified our architecture
Pros and Cons
  • "Amazon Kinesis has improved our ROI."
  • "Something else to mention is that we use Kinesis with Lambda a lot and the fact that you can only connect one Stream to one Lambda, I find is a limiting factor. I would definitely recommend to remove that constraint."

What is our primary use case?

In terms of use cases, it depends of which component we're talking about, as we use three of the 4 components. The only one we don't use is the Video Streams.

Kinesis Data Stream is the module that we have been using the longest, essentially we use it to hold data which will be processed by multiple consumers. We have multiple data sources and we use Kinesis to funnel that data which is then consumed by multiple other consumers. We gather data coming from IoT devices, user phones, databases and a variety of other sources and then, as we have multiple consumers, we use Kinesis to actually gather the data and then we process it directly in Lambda, in Firehose, or in other applications.

How has it helped my organization?

Amazon Kinesis has absolutely improved our organization. Before Data Streams, we were using a couple of other solutions, including Talend and Pentaho, to move data around. Each of them were their own silos. So the ability to have one single flow of  data from multiple consumers simplified our architecture a lot because you didn't need to copy or read the data multiple times, you just pull that data and then use multiple consumers. It actually simplified our architecture. It will also help us in the future when we have to build additional applications based on the same input data. We already have that data available. It will just be a matter of building the application itself. So it saves us a lot of time.

For Firehose, we perceive time-savings as a result of its incorporation. It takes you a couple of minutes to configure and it saves quite a lot of time in trying to get our information into the data lake. 

Regarding Kinesis Analytics, we have real-time alarms and real-time data flows to populate other systems. For example, we populate Salesforce using a tumbling window implemented with Kinesis Data Analytics and Lambda. We also have alarms for things like knowing when someone is affecting our assets and we need to warn the operators in real-time. So Kinesis Analytics has actually given us the ability to track things in real-time that before we didn't have the ability to track.

Because we couldn't do that in the database we needed a component that had the ability to get the last window of data super quickly and if something was wrong, to notify and identify the failing record or the information that we wanted to trigger and with Lambda to notify the user. At certain points, when we had operational issues, we implemented alarms that have the key indicators to help us attack those issues before they grew and it was too late to attack them. So that has been essential for us.

What is most valuable?

I think that all Kinesis components have their own features and their own value. Starting from Data Streams, you have to have it as the data queue or else you would need to go to Kafka or another message broker (with higher implementation effort if your ecosystem is fully hosted in AWS already). I think that the solution they have put together in Kinesis is fairly easy to use. It is definitely a core component in any data architecture. 

On the other hand, I find Firehose super simple and super useful for certain use cases. I wouldn't say it is as essential as Data Streams, but it is very handy if you want to just dump data. The connection between Data Streams and Firehose allows you to do that without worrying too much about performance and configuration. I find Firehose super simple to use for a very specific use case, but that use case is very common.

Kinesis Analytics is definitely more cutting edge. Out of Kinesis this is the most innovative part. We have used it for some alarms and for some batch processing in time windows. If we are talking about massive amounts of data, then you need to move to other solutions such as EMR or Glue for big data. If the amount of data is manageable and you want something to analyze on the fly, Kinesis Analytics is very appropriate and it gives you the ability to interact via SQL. So it makes your life easier if you want to develop a relatively self-contained application to do analytics on the fly.

I would say that Data Streams, in a matter of weeks, created a massive time-saving. Something that we haven't factored in is cost savings because we don't need to repeat the same data flow multiple times since each of those data flows are actually cost associated. We're talking about a couple of $100's per month, which is significant. In terms of time-savings here, we are in the scale of weeks. 

What needs improvement?

In terms of what can be improved, I would say that within Data Streams, you have a variety of ways to interact with the data; you have the Kinesis client library, the KCL, and you have the Kinesis agent. When we were developing our architecture a couple years back, all the libraries to aggregate the data were very problematic. So the Kinesis Aggregator, which essentially improves the performance and cost by aggregating individual records into bigger one, is something that I found had a lot of room for improvement to make it a lot more refined. At the time I found a couple of limitations that I had to work around. So definitely on that side I found room for improvement.

Something else to mention is that we use Kinesis with Lambda a lot and the fact that you can only connect one Stream to one Lambda, I find is a limiting factor. I would definitely recommend to remove that constraint.

For how long have I used the solution?

I have been using Amazon Kinesis for over 2 years.

What do I think about the stability of the solution?

Kinesis is super stable. This is one of the only few components in AWS for which we have never had any issues with the stability.

What do I think about the scalability of the solution?

Regarding scalability, you wouldn't use Kinesis Analytics for huge, vast amounts of data or for complex processing. It's for relatively simple processing with not too much data. So I wouldn't say that it is infinitely scalable, it really depends on your application and the volume of data.

Right now I don't see us using more of Kinesis. It has a very clear role in our architecture and satisfies that perfectly well. This is one of the initial components that you build. In a roadmap that would be the first 10%. All our work is spent in different actions right now, but we don't have any plans to grow Kinesis further. We used to do some specific real-time analysis with Kinesis Analytics on a case by case basis.It's more on a per need basis.

In other companies we use Kafka, but we didn't replace it with Kinesis.

How was the initial setup?

The initial setup is relatively straight forward. 

In terms of the initial setup of Kinesis Streams, is no big deal, you just choose the number of streams and assign a name to your application and that's pretty much it. The effort is in the applications that talk to Kinesis. I would say implementation took around six weeks. Deployment just took two people.

We have our own internal strategy which we started from scratch. So obviously we knew which components we would be deploying first. At the time we didn't use either CloudFormation or CodeBuild. So when we started, we didn't have these tools which we now use all the time for managing the architecture and CICD. But we didn't have it in the initial deployment. 


What was our ROI?

Amazon Kinesis has improved our ROI. We obviously pay monthly for Kinesis but for us it is an enabler. We wouldn't have an architecture, or we'd have a terrible architecture, if Kinesis wasn't there. 

For the data analytics component, we definitely saw that our ROI clearly improved. The alarms are something that we have actually implemented in very critical tasks when we had a company issue and that we have given visibility and a prompt response to the issues thanks to Kinesis Analytics. So that has definitely proven its ROI. 

What's my experience with pricing, setup cost, and licensing?

In terms of the prices, I think it is a fair price. Kinesis Data Stream has a very fair price relative to the value that it provides. Same for Firehose. As for Kinesis Analytics, I find it on the more expensive side because it's a newer component, something fewer people use, and something more innovative, cutting edge, and more specific. I would say Analytics is more on the expensive side of the spectrum. I would say that Kinesis Analytics is the only one that I may complain about if you like low pricing.

Which other solutions did I evaluate?

Kafka is comparable to Data Streams, not to Kinesis Analytics. For Analytics on the fly, I can talk about doing Spark streaming, which is a lot more complex and you need to spend a lot more time setting it up, but it also has more capability in terms of the scaling, so I wouldn't say it's a one-to-one comparison.

I also used StreamSets in the past, where you can gather data and you can also do some transformations on the fly. But again it's not comparable one-to-one so I wouldn't use it for the same use cases.

What other advice do I have?

My recommendation for Data Streams is to do a deep dive into the documentation before implementing to avoid what we did at the beginning. You try to process record by record or push record by record into Kinesis and then realize that it is not cost effective or even efficient. So you need to know that you  need to aggregate your data before you push it into Kinesis. So documenting yourself about the best practices in using Kinesis is definitely something I would recommend to anyone. For Kinesis Analytics, I was actually surprised at how easy it is to use an application with such power. I would say with a trial, users will realize that for for such a fairly complex application such as Kinesis Analytics, it is something that you can do very quickly with minimal resources and it gives you a lot of value for specific use cases. 

On a scale of one to ten, I would give Amazon Kinesis a nine. I don't have much to complain about Kinesis.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Amazon Kinesis
May 2022
Learn what your peers think about Amazon Kinesis. Get advice and tips from experienced pros sharing their opinions. Updated: May 2022.
595,546 professionals have used our research since 2012.
PeerSpot user
Senior Software Engineer at a computer software company with 201-500 employees
Real User
Top 10
Fast solution that saves us a lot of time
Pros and Cons
  • "Amazon Kinesis also provides us with plenty of flexibility."
  • "I think the default settings are far too low."

What is our primary use case?

I work as a senior software engineer in eCommerce analytics company, we have to process a huge amount of data.

Only a few people within our organization use Kinesis. My team, which includes three backend developers, simply wanted to test out different approaches.

We are now in the middle of migrating our existing databases in MySQL and Postgres, to Snowflake. We use Kinesis Firehose to ingest data in Snowflake at the same time that we ingest data in MySQL, without it impacting any performance.

If you ingest two databases in a synchronous way, then the performance is very slow. We wanted to avoid that so we came up with this solution to ingest the data in the stream.

We use Kinesis Firehose to send the data to the stream, which then buffers the data for roughly two minutes. Afterwards, it places the files in an S3 bucket, which is then loaded automatically, via an integration with Snowflake that's called Snowpipe. Snowpipe reads and ingests every message and every file that's in the S3 bucket. This stage doesn't bother us because we don't need to wait for it. We just stream the data — fire and forget. Sometimes, if the record is not ingested successfully, we have to retry. Apart from that, it's great because we don't need to wait and the performance is great.

There are some caveats there, but overall, the performance and the reality of it all has been great. This year, 100% of the time when there was an issue in production, it was due to a bug in our code rather than a bug in Kinesis.

How has it helped my organization?

We save a lot of time with Kinesis, but it's difficult to measure just how much. We actually have something similar regarding some other processes. We have developed somewhere else a tool that takes note of the contents of the stream, places them into a file, manually uploads them to the S3, and copies the files into Snowflake. That could be done with Kinesis, but it could take two weeks or 1 month less to get it production-ready.

What is most valuable?

The first would be the one found in the AWS SDK using the asynchronous client: put Record batch function which allows you to put a list of records in one put record request, which saves time and it's more efficient. Also, by using the asynchronous client, the records are sent in the background using an internal thread pool that can be configurable for your needs. In our performance testing, we came across this setting was the fastest solution. It didn't impact anything in the performance of the system process.

The second one would be the ability to link the stream to other places other than S3 via configuration of the stream and without changing a line of code.

Lastly, you can also link a lambda function to the stream to transform the data as it arrives in before writing it in S3, which is great to perform some aggregations or enrich the data with other data sources.

What needs improvement?

The default limit that they have, which at the moment is 5,000 records per second (I'm talking about Kinesis Firehose which is a specialized form of the Amazon Kinesis service) seems too low. Actually, on the first week that we deployed it into production, we had to roll it back and ask Amazon to increase the default limits.

It's mentioned in the documentation, but I think the default settings are far too low. The first week it was extremely slow because the records were not properly ingested in the stream, so we had to try it again. This happened the first week that we deployed it into production, but after talking with Amazon, they increased their throttling limits up to 10,000 records. Now it works fine.

For how long have I used the solution?

We've been using this solution since September 2019.

What do I think about the stability of the solution?

The stability is great. I'd say that maybe we have it running 99% of the time, and nothing stops it.

What do I think about the scalability of the solution?

Amazon Kinesis is definitely scalable. We have huge spikes of data that get processed around midnight and Kinesis handles it fine.

It automatically scales up and down, We don't need to compute it for that. It's great.

How are customer service and technical support?

The only time that we needed to contact Amazon was to ask them to increase the throttling limit. They replied to us very quickly and did what we asked.

Which solution did I use previously and why did I switch?

Initially, we were evaluating Kafka. I think Kafka is faster, but it's less reliable in terms of maintenance; however, when Kafka works, and you have it properly configured, it's much better than Kinesis, to be honest.

On the other hand, Kinesis provides us with better maintenance. Our DevOps team is already oversaturated, so we didn't want to increase the maintenance cost of the production environment. That's why we decided to go with Kinesis; because performance-wise, it's easy to configure and maintain.

How was the initial setup?

I found this solution to be really easy to configure. The essential parts of the configuration include naming the stream and also configuring the buffering time that it takes for a record to get ingested into S3 (how long it will be in the stream until it's put into an S3). You also need to link the Amazon S3 buckets with the Amazon Kinesis stream. After you've completed these configurations, it's pretty much production-ready. It's very, very easy. That's a huge advantage of using this service.

What about the implementation team?

Deployment took a few minutes.

You don't need a deployment plan or an implementation strategy because once you configure it, you can just use a stream. It's not an obligatory version that needs a library, etc. This stream is completely abstract in that way. You only need to configure it once, that's it.

What was our ROI?

We have seen a return on our investment with Amazon Kinesis. We are able to process data without any issue. It's our solution for ingesting data in other databases, such as Snowflake. 

Which other solutions did I evaluate?

Developing the stream process manually or using Kafka

What other advice do I have?

If you want to use a stream solution you need to evaluate your needs. If your needs are really performance-based, maybe you should go with Kafka, but for near, real-time performance, I would recommend Amazon Kinesis.

If you need more than one destination for the data that you are ingesting in the stream, you will need to use Amazon Kinesis Data Streams rather than Firehose. If you only want to integrate from one point to another, then Kinesis Firehose is a considerably cheaper option and is much easier to configure. 

From using Kinesis, I have learned a lot about the synchronous way of processing data. We always had a more sequential way of doing things. 

On a scale from one to ten, I would give this solution a rating of eight.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Jeff Levy - PeerSpot reviewer
Senior Engineering Consultant at a tech services company with 201-500 employees
Real User
Top 10
Easy to implement and use, with a robust and fault-tolerant data capturing facility
Pros and Cons
  • "The most valuable feature is that it has a pretty robust way of capturing things."
  • "If there were better documentation on optimal sharding strategies then it would be helpful."

What is our primary use case?

As part of my interest in obtaining Amazon certification and learning more about Kinesis, I am currently using it to capture streaming Twitter data.

I get an avalanche of tweets and I need some technology to harness and capture them. I have used the streaming Twitter API to deal with it. Twitter is updated every half a second, so I'm tapping into the streaming API and capturing a lot of stuff.

It has also been used for the Internet of Things (IoT), where there is a lot of streaming stuff that comes out and you need a mechanism to capture all of it from your devices. This includes things such as logs. My company was recently working on a project with Kinesis where we were capturing data from racecars.

These racecars were emitting tons of data and it needed to be captured by some kind of tool for analytics. Kinesis was used to capture all of that information. The basic use case is just capturing the data. In the streams, you can do some sort of interim transformations but for the most part, the basic use case is just capturing data and persisting it in a data store like Amazon S3. Another example is Elastic MapReduce permanent storage. Once it lands in some kind of permanent store, further transformations or aggregations can be done at that point.

How has it helped my organization?

In the racecar project that we worked on, the client wanted to be able to capture metrics in real-time to allow for the adjustment of racing strategy.

What is most valuable?

The most valuable feature is that it has a pretty robust way of capturing things. You can capture things from the beginning, or start capturing tweets at a certain point in time.

It has some good fault tolerance in case something breaks.

It's really easy to implement, get started, and use.

With AWS, you don't have to invest in any kind of infrastructure. All you have to do is go to the portal, create an account, turn it on, and use a few lines of Python code in order to capture what you're looking for.

The Kinesis API is really easy to put information on the shards. You just need to enter a few lines of code.

What needs improvement?

I'm currently trying to figure out production rates and consumption rates for data. If there were better documentation on optimal sharding strategies then it would be helpful. 

What do I think about the stability of the solution?

I think that this product is very stable and very fault-tolerant. 

As part of consuming data off of the stream, you do get some sort of unique number that is somewhat sequential. This means that if you have a problem with the data and something breaks, you can simply go back to that location in the stream.

Imagine that it gives you an integer, 100, to indicate your point in the stream. Then, if something fails, at a later point in time you can go back to spot 101 and continue retrieving data inside the stream. It's very fault-tolerant.

What do I think about the scalability of the solution?

The product is very scalable. Especially on the cloud, there is a large advantage.

How are customer service and technical support?

I haven't needed to contact technical or customer support.

Which solution did I use previously and why did I switch?

I am familiar with Kafka, although I have never used it.

Compared to Kafka, which requires physical servers, Kinesis, being on the cloud, is very easy to implement. It is a little easier to use, as well. Anybody who is interested in using it does not have to invest any money in a server or invest time in setting things up and configuring it on an actual environment with Kafka. All they have to do is go to AWS and turn it on.

I don't have any experience with other streaming analytics solutions.

How was the initial setup?

If someone knows what they're doing, they can have something up and running in half an hour. You can certainly use a deployment strategy, although I haven't to this point. I've just done it on my desktop, locally, in an IDE called PyCharm.

One can go ahead and deploy to an Amazon EC2 instance or AWS Beanstalk. I chose not to do this because it's easier for my project.

What about the implementation team?

I think as far as maintenance is concerned, you just kind of have to watch the production and the consumption of your data. You just have to make sure that everything's in order. They have metrics on the AWS console to help keep an eye on that kind of stuff but once it's up and running, you really don't have to do a whole lot of maintenance.

What other advice do I have?

My advice for anybody who is implementing this product is to start by reading through the Amazon documentation, as well as go through some videos on YouTube or Pluralsight just to get a high-level idea of what's going on. Then, start experimenting and trying to figure out how it works. From there, try to figure out how to choose your optimal sharding strategy, like how many shards do you need within the stream and how you want to partition the data within it.

I think from there, you need to look at your production and consumption rates on the stream. This is how much data you are putting onto the stream and at what kind of rate. You need to make sure that you're consuming data off of the stream, also, and look at that rate too.

The ideal use case is to be able to consume data faster than producing because then you're able to control things. If you're not able to do that, then you could get overwhelmed.

The biggest lesson that I learned from using this product is that it's a whole new world of processing big data. I come from a traditional data warehousing background where everything is batch-oriented. So for this, this is a whole new ball game in terms of how to process data. It's a new mechanism for harnessing the power of data. A traditional data warehouse could not analyze, for example, what is going on in real-time on a racing car. It's not scalable and it's not going to work. However, something like this is dynamic and big enough to handle this kind of application.

This is a pretty good product, albeit I don't have much to compare it with. That said, I don't have any problems with it. It's done what it's asked and it's easy to use.

I would rate this solution a nine out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Matt Newman - PeerSpot reviewer
Principal Data Engineer at a transportation company with 1,001-5,000 employees
Real User
Top 5
A great managed service that's simple and easy to maintain
Pros and Cons
  • "Everything is hosted and simple."
  • "Could include features that make it easier to scale."

What is our primary use case?

Our primary use case of this solution is for a streaming bus architecture, we get events and they come in through Kinesis like a Jason event. It's usually a change to a database, but it can be any event such as in our application, which feeds into the Kinesis and then we have a Lambda that consumes them and then finally it puts those into a data warehouse which is the ultimate goal. So it's a near real-time data warehouse. 

How has it helped my organization?

Instead of doing batch jobs of an ETL, like moving data into a data warehouse, we're now able to do it all through a continuous stream in Kinesis. It means that the data is more up-to-date in our data warehouse, it's more real time.

What is most valuable?

I've used Kafka in the past and Kinesis is a lot simpler. It's all hosted, it's nice it's really good. There aren't too many knobs and things to turn and ways to screw up. It's a pretty simple product and a lot easier to manage because it's hosted by AWS and it accomplishes what we need it to. The other nice thing is that we can make it available to external customers if they want to get a Kinesis feed of our things.  

What needs improvement?

I would say that the solution probably has the capability to do sharding so that you can do a lot of things in parallel. I think that the way the sharding works could be simplified and include features that make it easier to scale in a parallel way.

For how long have I used the solution?

I've been using this solution for five years. 

What do I think about the stability of the solution?

It's much easier to maintain than a Kafka cluster, but there were some nuances with it. I'd say maybe once a month or once every couple of months we'd get some weird things like AWS getting overwhelmed or the service was degraded for a few hours in a day. I guess that's a drawback of going with the fully managed service, it's just that it depends on the company keeping everything up. To me it's pretty stable, we would just get kind of slow once every month or once every couple of months. Overall I would say it's not perfect, but it was totally great, acceptable.

What do I think about the scalability of the solution?

You can do a bunch of shards, we only use one or a few shards and you can scale way up. It's way more scalable than we ever needed. And we were doing massive, millions of updates per minute. This is a backend service, so it's mostly used by developers and data engineers that were using Kinesis in our company, but our customers all benefited. Customers were sending send data through it, but they don't know that they are using Kinesis. We built the system and then they use it, so they don't know that under the covers it was consistent. We have five to 10 internal employees using it, but probably around 5,000 customers. Where I worked before, we used it basically everywhere. Probably 70% of all of our data was falling through Kinesis. Where I am now, we're just starting, so it's not yet being widely used. We plan to increase usage. 

How are customer service and technical support?

When we had some of those slow downs, we used AWS support and I can't say that we had a great experience and they resolved the issues, but they looked into some of the flow downs and ultimately we just decided there was nothing we could do. It left me feeling there was something lacking on the technical support side. They didn't get to the bottom of all my issues, but the issues weren't bad enough to be unhappy about the product overall.

Which solution did I use previously and why did I switch?

We previously used Apache Kafka. We switched because we were already using Amazon for everything else so it made sense, and it was a nice managed solution that would be a lot easier to deal with. It also integrated well with Lambdas. The whole AWS ecosystem is nice to work with because everything integrates with each other.

How was the initial setup?

The initial setup is definitely straightforward because it's a managed service and you only get a few options when you set up a Kinesis stream. It's a lot less overwhelming than setting up a whole Kafka cluster or even if you've managed Kafka, there's still a lot of configuration required to get it up and running and all the choices you can make about topics and things. Kinesis is just much simpler. It only lets you configure what you need to configure. I'd say that kind of POC took about a week and then real production probably a month. We used Terraform for our implementation strategy, but I used CloudFormation in my past job to do that. The deployment was essentially running the CloudFormation template.

What was our ROI?

Compared to what we were doing with Kinesis or with Kafka, which was taking about 30% just to keep things together, with Kinesis I think we're probably saving tens of thousands, if not $100,000 per year. 

What's my experience with pricing, setup cost, and licensing?

I would say pricing is really great. If pricing is an issue, I'd definitely recommend Kinesis because our Kinesis costs are under $1,000 a month. The product is super cost effective and it's the same with the licensing. Compared to Google Cloud and Azure, they're probably pretty similarly priced. I wouldn't say you're going to get a huge benefit going to Kinesis, but if you're considering using Kafka or another solution that's not hosted, it's not really worth all the effort when you could just go with a managed solution. It's a lot better cost-wise.

Which other solutions did I evaluate?

We took a look at Google cloud and Azure, they're Pub/Sub solutions, but not really in depth. Because we were already using Amazon, it just didn't make sense to use any other cloud provider.

What other advice do I have?

It's nice to deploy this with the Amazon goodness of Cloud Formation and Terraform, to have it all deployed in a repeatable way. I know that it's easy to go into the console and do it manually, but it's best to do infrastructure as code, in particular with Kinesis.

I would rate this solution a nine out of 10. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Real User
Top 10
User friendly and feature rich solution
Pros and Cons
  • "Its scalability is very high. There is no maintenance and there is no throughput latency. I think data scalability is high, too. You can ingest gigabytes of data within seconds or milliseconds."
  • "Kinesis Data Analytics needs to be improved somewhat. It's SQL based data but it is not as user friendly as MySQL or Athena tools."

What is our primary use case?

One use case is consuming sales data and then writing it back into the S3. That's one small use case that we have; from data Shields to data Firehose, from data Firehose to Amazon S3.

There are OneClick data streams that are coming in. For click streams data we established Kinesis data streams and then from Kinesis data streams, we dump data into the S3 using Kinesis Data Firehose. This is the main use case that we have. We did many POC's on Kinesis, as well. Also, one more live project using the DynamoDB database is running in Amazon. From DynamoDB we have triggers that automatically trigger to Lambda, and then from Lambda we call Kinesis then Kinesis writes back into the S3. This is another use case.

Another thing that we did is called Kinesis data analytics where you can directly stream data. For that, we use a Kinesis data producer. From that, we establish a connection to the data stream and then from the data streams to the SQL, which is the Kinesis data analytics tool. From Kinesis analytics, we again establish a connection to the data Firehose and then drive data back into the S3. These are the main use cases that we have for working on Amazon Kinesis.

How has it helped my organization?

In my client's company, there is one live database that comes into the DynamoDB. They want to replicate that in Amazon S3 for their data analytics and they do not want data to be refreshed every second. They want their data to be refreshed at a particular size, like five MBs. Kinesis provides data for that. That's the main improvement that we give to the client.

What is most valuable?

The features that I have found most valuable depend on the use case. I find data Firehose and data streams are much more intelligent than other streaming solutions.

There is a time provision as well as data size. Let's suppose you want to store data within 60 seconds, you can. Let's suppose you want to store data up to a certain size, you can, too. And then you can it write back to the S3. That's the beauty of the solution.

What needs improvement?

Kinesis Data Analytics needs to be improved somewhat. It's SQL based data but it is not as user friendly as MySQL or Athena tools. That's the one improvement that I'm expecting from Amazon. Apart from that everything is fine.

For how long have I used the solution?

I have two years of project experience on AWS, and around six months with Kinesis.

What do I think about the stability of the solution?

I am satisfied with Amazon Kinesis. It is pretty exiting to work on.

What do I think about the scalability of the solution?

Its scalability is very high. There is no maintenance and there is no throughput latency. I think data scalability is high, too. You can ingest gigabytes of data within seconds or milliseconds.

We are a team of five members using Amazon Kinesis. Two are working onshore and three of us are working offshore.

We are all developers implementing, developing, and designing the data pipeline, as well. The thing is we work in a startup company so we have to do all the things from our end on this.

How are customer service and technical support?

As of now we have not had any contact with customer support because we didn't face any complex types of problems while we were implementing our use cases.

How was the initial setup?

The initial setup is very straightforward. It is very well documented and anyone with simple knowledge or common sense can do it.

Implementing is very simple. You can just do it with your fingertips. There might be some improvements that can be made according to the requirements. For that, we do versioning. First we establish the pipeline from the data stream to the S3. That's very easy. You can do it within hours or within minutes. I can say the process is very simple and it's not as complex as it looks.

One more beauty is that Kinesis data Firehose will directly write to S3 in a partitioned way. Based on the timestamp it can directly write in the year, month, day and hour. That's the good thing I found about Amazon Kinesis.

We follow an implementation. We do the deployment directly on Dev. Once we get our results and our processes, and go through Q&A, we implement it directly throughout.

What was our ROI?

Our clients definitely see a return on their investment with Amazon Kinesis.

What's my experience with pricing, setup cost, and licensing?

The pricing depends on the number of shards that we are providing and the time the application is running. 

We reduced the cost of the pipeline that we built. We built a generic type of pipeline so that two more times can use same data pipeline.

What other advice do I have?

My advice to anyone thinking about Amazon Kinesis, is that if they have ClickStream or any streaming data which varies from megabytes to gigabytes, they can definitely go for Amazon Kinesis. If they want to do data processing, or batch or streaming analytics, they can choose Amazon Kinesis. And if you want to enable database stream events in Amazon DynamoDB, then you can definitely go for Amazon Kinesis. I don't see any better option for these other than Amazon Kinesis. You can use Amazon Kinesis Data Analytics Tool to detect an anomaly before you process the data. That's one more beauty. The first things we need to determine are the source and the throughput of the data and the latency you want.

On a scale of one to ten I would rate Amazon Kinesis a nine.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Akinola McLean - PeerSpot reviewer
Chapter Lead - Data and Infrastructure (Head of Department) at a tech services company with 51-200 employees
Real User
Top 10
Enables us to respond in real time; great auto-scaling feature
Pros and Cons
  • "Great auto-scaling, auto-sharing, and auto-correction features."
  • "Lacks first in, first out queuing."

What is our primary use case?

Our primary use case of this solution is as an intricate part of our data pipeline to deal with all of our big data problems. The traffic in our industry is highly volatile. At any given time we could have 10,000 users, and five minutes later it could be 100,000. We need systems fast enough to deal with that elasticity of demand, and the ability to deal with all the big data problems. Volume, velocity, ferocity, things like that. That's where we use the Kinesis platform. They have different iterations of it. The normal Kinesis Stream, is a little bit more manual, but we use that for our legacy technology, and for the more recent ones, we use Kinesis Firehose.

How has it helped my organization?

We dynamically change some of our product offerings based on user interaction. We can respond faster to user behavior, rather than waiting for the data to be at rest. We run some analytics models, and can then react in real time.

What is most valuable?

When it comes to Kinesis Firehose, the most valuable feature is the auto-scaling. It does auto-sharing, auto-correction, things like that and responds dynamically. Secondly, it innately has all the features of our reliable data pipeline, allowing you to store raw documents and transform data on the fly. When data comes into the stream through Firehose, we can see it and analyze every single object, keep the raw objects, carry out some transformations on it in flight, and then put it at rest. It allows us to do some real time analytics using Kinesis Analytics. We do anomaly detection in flight as well. We receive any changes with regards to user patterns and behaviors, in real time because Kinesis allows that.

What needs improvement?

They recently expanded the feature sets, but when we were implementing it, it could only deliver to one platform. I'm not sure where it's at now but multiple platforms would be beneficial. I'd also like to have some ability to do first in, first out queuing. If I put several messages into Firehose, there's no guarantee that everything will be processed in the order it was sent. 

What do I think about the stability of the solution?

We've had no problems with stability and we implemented well over a year ago. 

What do I think about the scalability of the solution?

The scalability of this solution is good. We are using it extensively with pretty much every single one of the flows.

How are customer service and technical support?

The technical support could be improved. They tend to send you back to the documentation. 

Which solution did I use previously and why did I switch?

We switched to Kinesis because of the technical complexity of the previous solution. In the previous solution, Ops would write feeds on the SQS queue, and then it required physical machines to connect, pull the data, transform it and write. That required three or four different technologies. Kinesis has removed a lot of technical complexity to the architecture.

How was the initial setup?

The initial setup was straightforward. Both the user interface and the programmatic access is very intuitive. And again, it's not difficult, even non-technical people would be able to set it up. It took two people to implement. I was responsible for data architecture and we had a developer to transform the data inside. Deployment took less than an hour. The documentation was very helpful. 

What was our ROI?

We've been able to drop our costs for ingesting data by about 60 to 70%.

Which other solutions did I evaluate?

We didn't evaluate anything else because no other product offered that type of fast solution at the time. Whatever we looked at added technical complexity to the architecture.

What other advice do I have?

It's important to think about how you are going to fix the end points that connect to your Kinesis files.

I would rate this solution a nine out of 10. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Big Data Architect
Real User
Top 10
Great for large environments and has good configuration but needs and experienced person to set it up
Pros and Cons
  • "The solution works well in rather sizable environments."
  • "In order to do a successful setup, the person handling the implementation needs to know the solution very well. You can't just come into it blind and with little to no experience."

What is our primary use case?

We use this solution for quite large environments.

We use it to capture and process a lot of data. We use it, for example for data analytics and query and analyze a stream's data.

How has it helped my organization?

We are a sizable organization and as such, have a lot of data. The solution allows for real-time analysis and you can use a scaler to handle data flows. 

What is most valuable?

The solution is very flexible and allows for a lot of configuration. It just offers up a lot of possibilities.

I'm using Amazon S3 and Redshift using Amazon server. I can make large configurations and update in near real-time, so that we have real-time use for batch intervals. 

The solution is great for scanning in order to handle environmental data.

The data stream feature on offer is excellent. We use it quite extensively.

The solution works well in rather sizable environments. We deal with a lot of data and it handles it very well.

The solution has a very good alerts system to allow us to respond in real-time.

The dashboards are excellent.

The solution offers very good data capture and integrates well with Power BI and Tableau, for example.

The product makes it very easy to create jobs.

What needs improvement?

The automation could be better. The solution needs to be better at information capture.

Some jobs have limitations which can make the process a bit challenging.

In order to do a successful setup, the person handling the implementation needs to know the solution very well. You can't just come into it blind and with little to no experience.

For how long have I used the solution?

I've used the solution for six or seven years or so.

What do I think about the scalability of the solution?

We work with very large environments and haven't had any issues with feeling constricted by the solution.

How was the initial setup?

Personally, based on my past experience and my long history with the solution, the initial setup was not complex. It was pretty straightforward. I find it very easy to use these tools.

A user will need to understand how to create analytics using processing a large amount of information. There may be legacy solutions in the mix as well. A new user will need to understand the environment and all of the requirements before really digging in.

What I will need, basically, is a data map, where I can find any legacy data. From there I can do the setup and it goes pretty smoothly.

What about the implementation team?

I handle the implementation myself.

Which other solutions did I evaluate?

You can compare this solution to Data Factory and Hadoop. They have a few overlapping characteristics. However, for my industry, Hadoop, for example, wouldn't work as it was lacking some characteristics and parameters and some understanding of the industry itself.

What other advice do I have?

I have a lot of experience in Kinesis and data analytics including in networking in the Amazon AWS environment. My experience is as a big data architect. I draw all environments in AWS. 

On a scale from one to ten, I would rate the solution at a six. It's pretty good, and great for big environments, however, you do need to be well versed in the product to set it up.

Disclosure: I am a real user, and this review is based on my own experience and opinions.