2020-10-07T07:04:00Z

What needs improvement with Apache Flink?

Miriam Tover - PeerSpot reviewer
  • 0
  • 34
PeerSpot user
14

14 Answers

AC
Real User
Top 5
2024-02-05T16:58:00Z
Feb 5, 2024

Apache Flink should improve its data capability and data migration.

Search for a product comparison
PrashantVaghela - PeerSpot reviewer
Real User
Top 10
2023-11-20T13:31:49Z
Nov 20, 2023

One of the ways to interact with Flink is through a tool called PipeLINK for writing Flink code, and it doesn't require you to use Python directly. While it does offer a Python-like syntax called PyFlink. PyFlink is a subset of Python that is specifically designed for writing Flink code. It provides a simpler and more accessible way to write Flink code compared to using the Java or Scala APIs. PyFlink is not as fully featured as Python itself, so there are some limitations to what you can do with it. So, this is an area for improvement. However, it is a good choice for users who are not familiar with Java or Scala.

ZHIZHENG - PeerSpot reviewer
Real User
Top 20
2023-03-09T22:00:05Z
Mar 9, 2023

Apache Flink's documentation should be available in more languages.

Sunil  Morya - PeerSpot reviewer
Real User
Top 20
2022-11-18T14:49:28Z
Nov 18, 2022

The issue we had with Flink was that when you had to refer the schema into the input data stream, it had to be done directly into code. The XLS format where the schema is stored, had to be stored in Python. If the schema changes, you have to redeploy Flink because the basic tasks and jobs are already running. That's one disadvantage. Another was a restriction with Amazon's CloudFormation templates which don't allow for direct deployment in the private subnet. You have to deploy into the public subnet and then from the Amazon console, specify a different private subnet that requires a lot of settings. In general, the integration with Amazon products was not good and was very time-consuming. I'd like to think that has changed.

Ilya Afanasyev - PeerSpot reviewer
Real User
Top 5
2022-08-03T05:21:00Z
Aug 3, 2022

The solution could be more user-friendly. The debugging system could be more suitable in the new release.

Ertugrul Akbas - PeerSpot reviewer
Real User
Top 5
2021-07-29T15:57:58Z
Jul 29, 2021

There is a learning curve. It takes time to learn. The initial setup is complex, it could be simplified.

Learn what your peers think about Apache Flink. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,319 professionals have used our research since 2012.
Armando Becerril - PeerSpot reviewer
Real User
Top 5
2021-03-03T20:13:19Z
Mar 3, 2021

One way to improve Flink would be to enhance integration between different ecosystems. For example, there could be more integration with other big data vendors and platforms similar in scope to how Apache Flink works with Cloudera. Apache Flink is a part of the same ecosystem as Cloudera, and for batch processing it's actually very useful but for real-time processing there could be more development with regards to the big data capabilities amongst the various ecosystems out there. I am also looking for more possibilities in terms of what can be implemented in containers and not in Kubernetes. I think our architecture would work really great with more options available to us in this sense. Finally, it's a challenge to find people with the appropriate skills for using Flink. There are a lot of people who know what should be done better in big data systems, but there are still very few people with Flink capabilities.

JV
Real User
2021-02-02T17:14:03Z
Feb 2, 2021

I am using the Python API and I have found the solution to be underdeveloped compared to others. There needs to be better integration with notebooks to allow for more practical development. Additionally, there are no managed services. For example, on Azure, you would have to set everything up yourself. In a future release, they could improve on making the error descriptions more clear.

RP
Real User
2020-11-08T16:21:05Z
Nov 8, 2020

Flink has become a lot more stable but the machine learning library is still not very flexible. There are some models which are not able to plug and play. In order to use some of the libraries and models, I need to have a Python library because there might be some pre-processing or post-processing requirements, or to even parse and use the models. The lack of Python support is something they can maybe work on in the future.

VI
Real User
2020-10-21T04:33:00Z
Oct 21, 2020

In terms of improvement, there should be better reporting. You can integrate with reporting solutions but Flink doesn't offer it themselves. They're more about the processing side. Low latency processing is out of their scope. As ar as low latency is concerned, you can integrate to other backend solutions as well. They have that flexibility. APIs are good enough. Its in-memory is so fast, you could have faster-developed data and stuff like that.

RA
Real User
2020-10-19T09:33:00Z
Oct 19, 2020

In Flink, maintaining the infrastructure is not easy. You have to design the architecture well. If you want to scale for a larger number of streaming data you need good machines. You need good resilience architecture so that if it fails, you can recover from those with minimum downtime. You should have good storage systems to store and retrieve intermediate flink states(in case of stateful applications). Basically all the problems that come with a distribution system. So you have to have all that infrastructure for it to perform well. Best way is to look at the use cases you wish to support in 5-10 years ahead and design the architecture around flink accordingly.

BH
Real User
2020-10-13T07:21:29Z
Oct 13, 2020

TimeWindow feature. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks. Watermark is basically associating data in the stream with a timestamp. Documentation can be referred. They have updated rest of the documentaion but not the testing documentation. Therefore, We have to manually try and understand few concepts. Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is expected before integrating. Consider a use case where you want to actually analyze or get analytics about how much data you have processed and how many failed? Prometheus is one of the common metric tools out of the box supported by flink, along with other metric services. The documentation is straight forward. There is a learning curve with metric services, which can consume a lot of time, if not well versed with those tools. Failure handling basic documentation is provided by flink, like restart on task failure, fixed delay restart...etc.

JR
Real User
2020-10-13T07:21:29Z
Oct 13, 2020

We have a machine learning team that works with Python, but Apache Flink does not have full support for the language. We needed to use Java to implement some of our job posting pipelines.

SD
Real User
2020-10-07T07:04:00Z
Oct 7, 2020

The state maintains checkpoints and they use RocksDB or S3. They are good but sometimes the performance is affected when you use RocksDB for checkpointing. We can write python bolts/applications inside Apache Storm Code and it supports Python as a programming language, but with Flink, the Python support is not that great. When we do machine learning, data science, or ML work, we want to integrate the data science or machine learning pipeline with our real-time pipeline and most of the data science or machine learning work is in Python. It was very easy with Storm. Storm supports native Python language, so integration was easy. But Flink is mostly Java. The integration of Python with Java is difficult, so it's not direct integration. We need to find an alternative way. We created an API layer in between so the Java and Python layers were communicating by using an API. We just called data science models or ML models using the API which runs in Python while Flink runs in Java. We would like to see improvement where we can have another way to run it. Currently, it's there, but it's not that great. This is an area that we would like to see improvement.

Apache Flink is an open-source batch and stream data processing engine. It can be used for batch, micro-batch, and real-time processing. Flink is a programming model that combines the benefits of batch processing and streaming analytics by providing a unified programming interface for both data sources, allowing users to write programs that seamlessly switch between the two modes. It can also be used for interactive queries. Flink can be used as an alternative to MapReduce for executing...
Download Apache Flink ReportRead more