Apache Flink Reviews and Pricing

reviewer1494531

Head of Data Science at a energy/utilities company with 10,001+ employees

Feb 7, 2021

Download

Easy deployment and install, open-source, but underdeveloped API

Pros and Cons

"The setup was not too difficult."

"In a future release, they could improve on making the error descriptions more clear."

What is our primary use case?

I use the solution for detection of streaming data.

What needs improvement?

I am using the Python API and I have found the solution to be underdeveloped compared to others. There needs to be better integration with notebooks to allow for more practical development. Additionally, there are no managed services. For example, on Azure, you would have to set everything up yourself.

In a future release, they could improve on making the error descriptions more clear.

For how long have I used the solution?

I have been using the solution for two weeks.

Which solution did I use previously and why did I switch?

I have used many different competing solutions in the past such as Pyspark, Hadoop and Storm.

Buyer's Guide

Apache Flink

October 2025

Free Report: Apache Flink Reviews and More

Learn what your peers think about Apache Flink. Get advice and tips from experienced pros sharing their opinions. Updated: October 2025.

DOWNLOAD NOW

872,778 professionals have used our research since 2012.

How was the initial setup?

The setup was not too difficult.

What about the implementation team?

To deploy the solution locally it was relatively easy.

What's my experience with pricing, setup cost, and licensing?

The solution is open-source, which is free.

What other advice do I have?

When choosing this solution you have to look at your use case to see if this is the best choice for you. If you need to have super-fast realtime streaming, and you can develop in Scala, then it might make a lot of sense to use it. If you are looking at delays of seconds, and you are working on Python, then Pyspark might be a better solution.

I rate Apache Flink a six out of ten.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.

Hitesh Baid

Lead Software Engineer at a tech services company with 5,001-10,000 employees

Oct 19, 2020

Download

Drastically reduces the turnaround/ processing time, Documentation is in depth and most of the things are straight forward.

Pros and Cons

"The event processing function is the most useful or the most used function. The filter function and the mapping function are also very useful because we have a lot of data to transform. For example, we store a lot of information about a person, and when we want to retrieve this person's details, we need all the details. In the map function, we can actually map all persons based on their age group. That's why the mapping function is very useful. We can really get a lot of events, and then we keep on doing what we need to do."

"The TimeWindow feature is a bit tricky. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks. A watermark is basically associating every data with a timestamp. The timestamp could be anything, and we can provide the timestamp. So, whenever I receive a tweet, I can actually assign a timestamp, like what time did I get that tweet. The watermark helps us to uniquely identify the data. Watermarks are tricky if you use multiple events in the pipeline. For example, you have three resources from different locations, and you want to combine all those inputs and also perform some kind of logic. When you have more than one input screen and you want to collect all the information together, you have to apply TimeWindow all. That means that all the events from the upstream or from the up sources should be in that TimeWindow, and they were coming back. Internally, it is a batch of events that may be getting collected every five minutes or whatever timing is given. Sometimes, the use case for TimeWindow is a bit tricky. It depends on the application as well as on how people have given this TimeWindow. This kind of documentation is not updated. Even the test case documentation is a bit wrong. It doesn't work. Flink has updated the version of Apache Flink, but they have not updated the testing documentation. Therefore, I have to manually understand it. We have also been exploring failure handling. I was looking into changelogs for which they have posted the future plans and what are they going to deliver. We have two concerns regarding this, which have been noted down. I hope in the future that they will provide this functionality. Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is required in the documentation. We have a use case where we want to actually analyze or get analytics about how much data we process and how many failures we have. For that, we need to use Tomcat, which is an analytics tool for implementing counters. We can manage reports in the analyzer. This kind of integration is pretty much straightforward. They say that people must be well familiar with all the things before using this type of integration. They have given this complete file, which you can update, but it took some time. There is a learning curve with it, which consumed a lot of time. It is evolving to a newer version, but the documentation is not demonstrating that update. The documentation is not well incorporated. Hopefully, these things will get resolved now that they are implementing it. Failure is another area where it is a bit rigid or not that flexible. We never use this for scaling because complexity is very high in case of a failure. Processing and providing the scaled data back to Apache Flink is a bit challenging. They have this concept of offsetting, which could be simplified."

What is our primary use case?

Services that need real-time and fast updates as well as lot of data to process, flink is the way to go. Apache Flink with kubernetes is a good combination. Lots of data transformation grouping, keying, state mangements are some of the features of Flink. My use case is to provide faster and latest data as soon as possible in real time.

How has it helped my organization?

The main advantage is the turnaround time, which has been reduced drastically because of Apache Flink. Earlier, it used to take a lot of processing time but now things have changed, and everything is in almost real time. We get the latest data in a very less time. There is no waiting or lag of data in the application, time has been one of the important factors.

The other factor is memory. The utilization of the machine has been more efficient since we started to use this solution. The big data applications definitely use a large group of machines to process the data. These machines are not optimally utilized. Some of the machines might not have been required, but they still take hold of the resources. In Kubernetes, we can provide resources, and in Apache Flink, there is a configuration where you can do the deployment in combination with a single cluster node. Scalability is quite flexible in flink with task managers and resource configuration.

What is most valuable?

MapFunction, FilterFunction are the most useful or the most used function in Flink. Data transformation becomes easy for example, Applications that store information about people and when they want to retrieve those person's details in some kind of relation, in the map function, we can actually filter all persons based on their age group. That's why the mapping function is very useful. This could be helpful in analytics to target specific news to specific age group.

What needs improvement?

TimeWindow feature. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks.

Watermark is basically associating data in the stream with a timestamp. Documentation can be referred. They have updated rest of the documentaion but not the testing documentation. Therefore, We have to manually try and understand few concepts.

Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is expected before integrating. Consider a use case where you want to actually analyze or get analytics about how much data you have processed and how many failed? Prometheus is one of the common metric tools out of the box supported by flink, along with other metric services. The documentation is straight forward. There is a learning curve with metric services, which can consume a lot of time, if not well versed with those tools.

Failure handling basic documentation is provided by flink, like restart on task failure, fixed delay restart...etc.

For how long have I used the solution?

I have been using Apache Flink for almost nine months.

What do I think about the stability of the solution?

The uptime of our services has increased, resources are better utilized, restarts is automated on failures, alerts are triggered when infrastructure breaches threshold and application failure metrics are logged, due to which the application has become robust, scaling is something which can be tweaked as per usage of application, business rules. Also Flink parameters are configurable, altogether it made our application more stable and maintainable.

What do I think about the scalability of the solution?

I haven't actually hit a lot of performance or stress test to see the scaling.
There is no detailed documentation for scaling. There is no fixed solution as well, it depends on use cases, there are rest APIs to scale your task dynamically in Flink Documentation, haven't personally tried it yet.

How are customer service and technical support?

I haven't actually used their technical support yet.

Which solution did I use previously and why did I switch?

I have tried batch processing, It was not that much effective in my use case. It was time consuming and not real time solution.

How was the initial setup?

The initial setup is straightforward. A new joined person (with some experience in software industry) would not find it that much difficult to understand and then contribute to the application. When you want to start writing the code for it, then things get tricky and more complex because if you are not involved with what Apache Flink is providing and what you need to do, it is very difficult. If followed the documentation, there are various examples provided and different deployment strategies as well.

What was our ROI?

It is good solution for time and cost saving.

What's my experience with pricing, setup cost, and licensing?

Being open source licensed, cost is not a factor. The community is strong to support.

What other advice do I have?

To get your hands wet on streaming or big data processing applications, to understand the basic concepts of big data processing and how complex analytics or complications can be made simple. For eg: If you want to analyze tweets or patterns, its a simple use case where you just use flink-twitter-connector and provide that as your input source to Flink. The stream of random tweets keeps on coming and then you can apply your own grouping, keying, filtering logic to understand their concepts.

An important thing I learned while using flink, is basic concepts of windowing, transformation, Data Stream API should be clear, or atleast be aware of what is going to be used in your application, or else you might end up increasing the time rather than decreasing. You should also understand your data, process, pipeline, flow, Is flink the right candidate for your architecture or an over kill?
It is flexible and powerful is all I can say.

Disclosure: My company does not have a business relationship with this vendor other than being a customer.