Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC
Real User
Top 5Leaderboard
A useful solution to set up workflows and processes
Pros and Cons
  • "Designing processes and workflows is easier, and it assists in coordinating all of the different processes."
  • "The graphical user interface can be improved."

What is our primary use case?

Our primary use case for the solution is setting up workflows and processes applied everywhere because most industries are based on workflows and processes. We've deployed it for all kinds of workflows within the organization.

What is most valuable?

The ability to easily set up and deploy workflows with Airflows is valuable. Additionally, designing processes and workflows is easier, and it assists in coordinating all of the different processes.

What needs improvement?

The solution can be improved by creating a tool that allows us to do these kinds of things graphically instead of just writing scripts. Hence, the graphical user interface can be improved.

For how long have I used the solution?

We have been using the solution for approximately one year and are currently using the latest version.

Buyer's Guide
Apache Airflow
April 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,236 professionals have used our research since 2012.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

The solution is scalable. Approximately hundreds of thousands of people are utilizing it.

How are customer service and support?

We have not had any issues that require customer service and support.

How was the initial setup?

The initial setup is intermediate, and two people are required for deployment.

What was our ROI?

There is a return on investment because it's free, open source and very useful, so there is a significant return on investment.

What other advice do I have?

I rate the solution an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
PeerSpot user
Global Data Architecture and Data Science Director at FH
Real User
ExpertModerator
Managing large scale Data Pipeline and Python tasks have been made easy
Pros and Cons
  • "I found the following features very useful: DAG - Workload management and orchestration of tasks using."
  • "UI can be improved with additional user-friendly features for non-programmers and for fewer coding practitioner requirements."

We have been using Apache Airflow for the past 2 years for various use cases such as: 

  • Data Pipeline building and monitoring
  • Automation of data extraction processes and Intelligent Automation
  • Web Scraping at scale for financial services 

We manage large-scale data processing workloads using DAG (Directed Acyclic Graph), which is a core concept of Airflow (Apache Airflow is commonly known as Airflow) expediting error handling and logging. It helped us to manage the complex workflows and orchestration of tasks efficiently.

I found the following features very useful:

  • DAG - Workload management and orchestration of tasks using 
  • TaskFlow API - moving Python tasks have been made easy, cleaning of DAGs using @task decorator in python
  • Connection and Hooks - interface to connect external systems

To be able to implement various useful functionalities of Airflow effectively you would need to be a very good python programmer. UI can be improved with additional user-friendly features for non-programmers and for fewer coding practitioner requirements.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Apache Airflow
April 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
769,236 professionals have used our research since 2012.
Engineering Manager - OTT Platform at Amagi
Real User
Helps us maintain a clear separation of our functional logic from our operational logic
Pros and Cons
  • "The reason we went with Airflow is its DAG presentation, that shows the relationships among everything. It's more of a configuration-driven workflow."
  • "One specific feature that is missing from Airflow is that the steps of your workflow are not pipelined, meaning the stageless steps of any workflow. Not every workflow can be implemented within Airflow."

What is our primary use case?

We are a technology, media, and entertainment-technology company. We are using Apache Airflow for architecting our media workflows. We are using it for two major workflows.

We have had it set up for some time on our own cloud. Recently, we migrated the setup to AWS.

How has it helped my organization?

Airflow is our first choice because we wanted a clear separation of our functional logic from our operational logic. We don't want our microservices to have the cross-cutting responsibilities of our operational logic. Right now, our microservices are the core business' inner functional logic. The majority of our distribution, our decision making, and the majority of our workflow operational responsibilities have been added to Airflow.

What is most valuable?

The reason we went with Airflow is its DAG presentation, that shows the relationships among everything. It's more of a configuration-driven workflow. 

It's all Python, as well. The majority of the configuration is Python-friendly.

What needs improvement?

One specific feature that is missing from Airflow is that the steps of your workflow are not pipelined, meaning the stageless steps of any workflow. Not every workflow can be implemented within Airflow. For example, Step 1 of my workflow will have output which I definitely want to automatically be provided as an input to my Step 2. At the workflow level, we want to have common state management where, across steps, we'll be able to reach the state information. Right now, we're using an external state repository to maintain the state.

If Airflow could come up with some kind of implementation, where not every step of the pipeline is an independent step, that would be helpful. I would like it if a part of the output of your previous steps could be Apache input for your next step. That kind of pipeline is missing. When we consider other products like jBPM, Camunda, or Cadence, they have the concept of pipelining.

I would also like to see support for more platforms, in terms of programming BPMs. Cadence supports Golang and Java. Legacy components can be from any platform, so if they could provide more client support for Java client library and Golang, that would be helpful. I want it to program in Java.

For how long have I used the solution?

I have been using Apache Airflow for more than a year.

What do I think about the scalability of the solution?

It's definitely scalable.

We have been using Airflow for sometime but we are not heavily dependent on it. We only have a couple of use cases being executed by Airflow. 

Because we have some data engineering problems, we have a good amount of analytics systems. We have a high volume of data that comes into our system, along with a lot of email, and we have to have an automated data pipeline. Given that, we have all these computing capabilities that are built of microservices. The beauty of it is its scalability. It has every step of your workflow, and it has scheduler capabilities. Every step of your workflow is delegated to one of your nodes. That is being scaled per your computing needs.

We are still evolving. Our business processes are not completely automatic. We're still in the process of identifying what all the automation cases are that we can bring under Airflow. We would like to leverage one common orchestrator or workflow BPM for our complete ecosystem. So we have some architects in our system who are happy with Airflow and others who would like to migrate to some other BPM like Cadence or Apache NiFi. There are a lot of orchestrators and we're just out of the gate. Airflow is still not being heavily used in our enterprise.

Which solution did I use previously and why did I switch?

This is the first workflow BPM tool that we are using in our platforms.

How was the initial setup?

There is comprehensive documentation for setting up a simple workflow and you just follow the documentation for setting things up. We're all engineers so we don't mind if the steps are lengthy, in terms of setting up the system. I'm quite okay with the documentation provided for getting your system up and running.

But I would appreciate it if they published a portal where we could see in what way other businesses, or other technology companies are solving their problems, with some case studies, using Airflow. It would help us to review their case studies. My biggest problem at the time when I was deciding whether Airflow fit our needs or not, was that I was looking for some case studies of technology companies that are already using the solution. With Camunda and jBPM, there is a good quantity of case studies available online.

Which other solutions did I evaluate?

There is no scarcity of BPMs. There are many products online: either open-source or community products or licensed products. There are many good BPMs. The reason that Airflow is in my system is that some of our workflows which we have onboarded are also on Python. Airflow complements that. But the first and foremost ability of any orchestrator should be to integrate with any underlying platform, be it a Java platform or a Python platform. That's the beauty of an orchestrator.

What other advice do I have?

We have a team of people, four to five team members, who initially evaluated Airflow and  wanted to implement it.

We have customers onboarded on our legacy systems. I cannot disrupt the service and bring everything into Airflow. I have to onboard Airflow seamlessly, while I protect my current, ongoing business systems. So I'm trying to balance things here. We have only been able to onboard a couple of workflows. Eventually, we want to do it more fully, but there were a few challenges as I told you: There is no pipeline to take information, which is forcing me to retain my state in a separate state repository. That would be the next big area where I would like to see improvement.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Solutions Architect/ Software Architect at a comms service provider with 51-200 employees
Real User
Integrates well with other pipelines and builds different processes well but the scalability needs improvement
Pros and Cons
  • "The product integrates well with other pipelines and solutions."
  • "The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not."

What is our primary use case?

We normally use the solution for creating a specific flow for data transformation. We have several pipelines that we use and due to the fact that they're pretty well-defined, we use it in conjunction with other tools that do the mediation portion. With Airflow, we do the processing of such data.

What is most valuable?

The product integrates well with other pipelines and solutions.

The ease of building different processes is very valuable to us. The difference between Kafka and Airflow, is that it's better for dealing with the specific flows that we want to do some transformation. It's very easy to create flows. 

What needs improvement?

The graphics in the past have not been ideal.

We have several areas where we feel they could improve in terms of being a little bit more flexible. One is implementation. Even though we customized it, there were some specific things we had to do with the image by itself.

The management integration was challenging as well. It requires a lot of work on our end. We were creating our own way to integrate things specifically with specific tools. There's not really an ease of management out-of-the-box option for integration. We needed to become a little bit creative to solve that ourselves.

The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not.

There is no SDC versioning. There's no virtual control for pipelines. We have to build several pipelines for several flows, yet there's not a virtual control to generate them.

There's no Python SDK. We need to generate our own scripts and upload them and put them there. However, there's not a realistic case that we can get connected to them. On top of that, the API sets that are provided are very limited. They are not as rich as others. You cannot do much with them.

For how long have I used the solution?

I've been using the solution for maybe three years at this point. It hasn't been too long.

What do I think about the stability of the solution?

The solution is largely stable. Obviously when you start creating more use cases, then you realize the limitations, however, it's not really, really bad.

What do I think about the scalability of the solution?

Due to the fact that the solution is on the cloud, we thought it would be fairly easy to scale. This is proving not to be the case and scalability is limited.

The challenging part is to make it really flexible in a cloud-native environment. With other applications, what you have there is the scalability that can be sensitive to your needs, based on the amount of data you are putting into the flow.

Instead of you having to create your own logic to scale it up, it should be a little more efficient on how it gets integrated into the whole environment. You have to get a little bit creative and put some commands and some logic in there and be monitoring everything. You build everything - versus other options that are more out of the box. With other solutions, if you have these bursts of data they ultimately can scale up and they are more native.

How are customer service and technical support?

Technical support has been pretty good. We don't really have anything to complain about. We're satisfied with the service so far.

Which solution did I use previously and why did I switch?

For this particular category, due to the fact that we're testing all the other tools and they were too much of what we needed and due to the fact that we have used other products in other projects, and nothing really worked for us. Airflow, being a bit different, we decided that it was a nice player and a good open-source tool. 

We do use other tools. However, this one seems to work quite well for us.

How was the initial setup?

The initial setup isn't as straightforward as we hoped. It's not as flexible as other options. You need to be a bit creative during the process.

What's my experience with pricing, setup cost, and licensing?

This product is open-source.

What other advice do I have?

We're just customers and end-users. We don't have a special business relationship with Apache.

I'm not sure of which version of the solution we're using. It's likely the most up-to-date, or at the very most back two or three versions as we are not using any of the older versions.

I'd advise others considering the solution to first understand what exactly you're trying to achieve. You either select a non-cloud native Apache workflow manager or select something that is way too big for what you are actually trying to achieve. Understand what is exactly what you need and the volumes that you need, and what exactly are the use cases.

After that, in terms of deployment, that depends on what you exactly are trying to do. If all of your solutions are cloud-native, try to do it with a cloud-native tools solution. Specifically, go to the CMCS site and look into the solutions that there. Those have been tested at least for the cloud-native solutions that exist.

Then, just make sure that the components you have will match and will be available to whatever you're trying to build. For example, the user management is something that is important for us and for this specific setup. Probably for some others, it's not going to be. 

Take into consideration, what are the different connection points and make sure that they are either supported or that you can support the integration of such items. You need to have a proper developer that can help you build your connector or your API.

In general, I would rate the solution at a seven out of ten. If they fix the APIs and the price on LTK, I'd rate it closer to a nine.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Solution Architect at EPAM Systems
Real User
Top 5
Simple to automate using Python, but code does not cover all data warehousing tasks
Pros and Cons
  • "This is a simple tool to automate using Python."
  • "We need to develop our workflow description and notations because out of the box, Apache Airflow does not provide some features that are needed."

What is our primary use case?

The primary use case for this solution is to automate ETL process for datawarehouse.

What is most valuable?

The most valuable feature is the UI, for automation.One can monitor all ETL processes in single screen. Complex workflows are shown as DAGs SVG images.

This is a simple tool to automate using Python.

What needs improvement?

There are some drawbacks to this solution. The code does not cover all tasks in the data warehouse automation process.  Currently , in production, we have a large installation with a complex workflow that includes hundreds of tasks. Most of them are dispatched by existing engine, but not all.
For example, sometimes we need to create cycles in our workflow but we are not able to, because Airflow supports only Direct Acyclic Graphs ( DAGs )

We need to develop our workflow description and notations because out of the box, Apache Airflow does not provide some features that are needed. It is our understanding that it is limited by design.

We will wait for the latest 2.0 version, as it is awaited to be much more mature than the 1.8-1.10 version. We believe that it will be better.

There should be some improvement made to the Doc Management features from within the UI. They should think about Outlook integration, which should be out of the box, and the object model should be expanded to support cyclic graphs inside the workflow.

For how long have I used the solution?

We have been using this solution for eighteen months.

What do I think about the stability of the solution?

This solution is not very stable. There are a number of configurations issues.

What do I think about the scalability of the solution?

This solution is scalable. We use this solution in a single node, but it is possible to have a  cluster of workers.

It can be used for one or two thousand related tasks and should be done in a cluster configuration. 

We don't use a cluster, rather we only use single nodes. It is sufficient for our tasks. Tasks are long and the parallelism is limited by the database engine, and not by the workflow engine. 

We would like to evaluate clusters in the future.

We are using the Cron Task scheduling feature for Apache Airflow. Users can configure the Apache Airflow themselves. There are up to ten users that can configure Apache Airflow.

This is a part of the wage solution, and it is the initial point of the wage slot process. The wage solution has hundreds of users.

How are customer service and technical support?

We don't use any paid technical support, as it is an open-source solution. We have used Stack Overflow and other open information sources, but we know that some companies provide technical support. 

As we have studied their solutions that are available on the internet, it is my understanding, that, we are on a pretty high level and could provide commercial support ourselves. 

We don't use any support from commercial companies, but some very useful recent solutions we could extract from Apache Airflow GitHub, as an example.  

Which solution did I use previously and why did I switch?

Previously, we used Control-M for a short period. It was a solution used by our customers, and we needed to understand their difficulties and the results. 

For low to middle scaled tasks, Apache Airflow could be a substitute for Control-M

How was the initial setup?

The deployment model we used was through a private cloud. It was a private installation on Google Cloud.

What about the implementation team?

In-house team.

What was our ROI?

It 's measured jut now. Precise data is awaited in 3..4 months. First conclusion - positive ROI

What's my experience with pricing, setup cost, and licensing?

There are no costs associated with this solution. Apache Airflow is a free solution that can be downloaded and ready for use at any moment.

Which other solutions did I evaluate?

Our tasks can be automated by simple Jenkins, but our customer wanted to implement it on Apache Airflow. This was a solution used by our customer.

Apache Airflow is mainstream and everyone wants to use it. Google provides Apache Airflow as part of the Cloud services.

What other advice do I have?

My advice would be to use this solution for simple tasks. 

They should have a Python expert for features that are not available out of the box, as it is not enough. 

It could be a good solution for enterprise workflow automation and solutions like Control-M within the next two to three years.

We are happy and satisfied with this solution, but not fully satisfied, as this solution has some positive and negative aspects.

I would rate this solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Associate Director - Technologies at a tech services company with 51-200 employees
Real User
Quick and easy to set up, but the technical support needs to be improved
Pros and Cons
  • "The initial setup was straightforward and it does not take long to complete."
  • "Technical support is an area that needs improvement."

What is our primary use case?

Our primary use case is to integrate with SLAs.

What is most valuable?

The most valuable feature is the workflow.

What needs improvement?

Technical support is an area that needs improvement. The contact numbers should be readily available so that we can call to get support as required.

In the future, I would like to see a single-click installation.

For how long have I used the solution?

We have been working with Apache Airflow for approximately one month.

What do I think about the scalability of the solution?

In our company, we are doing a POC and there are only three users. We have also implemented it for clients.

We do plan to increase our usage and the POC that we are now working on is something that we will implement for other clients if it works.

How are customer service and technical support?

We are not satisfied with technical support. We rely on using Google to identify solutions for the problems we have.

Which solution did I use previously and why did I switch?

We did not use another similar solution prior to Airflow.

How was the initial setup?

The initial setup was straightforward and it does not take long to complete. The deployment took no more than an hour.

Which other solutions did I evaluate?

We evaluated Control-M and another similar product from IBM.

What other advice do I have?

This is a good product and I definitely recommend it.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees
Real User
Scalable, stable and simple installation
Pros and Cons
  • "We have been quite satisfied with the stability of the solution."
  • "The dashboard is connected into the BPM flow that could be improved."

What is our primary use case?

We mainly used the solution in banking, finance, and insurance. We are looking for some opportunities in production companies, but this is only at the very early stages.

What is most valuable?

I do not have specific feedback because it is quite early in the review stage for comment.

What needs improvement?

The dashboard is connected into the BPM flow that could be improved.

For how long have I used the solution?

I have been using the solution for half a year.

What do I think about the stability of the solution?

We have been quite satisfied with the stability of the solution.

What do I think about the scalability of the solution?

The scalability of the solution is good.

How are customer service and technical support?

We had no issue with technical support.

How was the initial setup?

The installation is straightforward.

What's my experience with pricing, setup cost, and licensing?

The pricing for the product is reasonable.

Which other solutions did I evaluate?

We are evaluating Camunda as well as this solution. We are investigating and trying to determine how suitable they are for production facilities. Additionally, we are seeing where the solutions are actually suitable in what type of processes.

What other advice do I have?


We are unsure of which solution we will end up with, we are testing them currently. We are trying to get into new business types and new industries. We are looking into how well the solutions can be used in production facilities.

I rate Apache Airflow an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.