SabinaZeynalova - PeerSpot reviewer
Data Engineer Team Lead at Unibank
Real User
Top 5
Can be used with multiple systems and servers, Kubernetes systems, and dashboard systems
Pros and Cons
  • "The product is stable."
  • "There is a need for more features on experimental evolution steps."

What is our primary use case?

We use Apache Airflow for the automation and orchestration of model deployment, training, and feature engineering steps. It is a model lifecycle management tool.

How has it helped my organization?

We have an integration with Apache Airflow in our portal for messaging. We use group and transformation data from Redshift to Tesco, and then create a call flow to the router. This is a source of data leakage, such as data engineering and machine learning, especially in a HIPAA environment. We need to check the evolution steps in the pipeline. In production, we only have two cases. Sometimes, we need customer data not in the database, which we get from object storage. The call flow from Redshift to Tesco involves transforming the data and then generating it with the router or Kibana router for the policy. The data is then transformed and sent to the dashboard or data warehouse.

What needs improvement?

Airflow is a pipeline for transferring code by clients, but for experimental model experiments, Apache Airflow does not have any solution. There is a need for more features on experimental evolution steps.

For how long have I used the solution?

I have been using Apache Airflow for one and a half years.

Buyer's Guide
Apache Airflow
April 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,847 professionals have used our research since 2012.

What do I think about the stability of the solution?

The product is stable. I rate the solution’s stability an eight out of ten.

What do I think about the scalability of the solution?

20 users are using this solution in our organization. I rate the solution’s scalability an eight out of ten.

How was the initial setup?

The initial setup is not complex and can be done by two people. However, open-source prime solutions have some difficulties. We can schedule Apache Airflow on Kubernetes. Space limitations and installation issues may arise, as we do not have full control over Kubernetes cluster resources, and our administration is limited. I rate the initial setup a six out of ten, where one is difficult, and ten is easy.

What other advice do I have?

I recommend Apache Airflow because it is still profitable and can be used with multiple systems and servers, Kubernetes systems, and dashboard systems. You can use it to get social media and other data, but it can be expensive. Overall, I rate the solution a nine out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
ManojKumar43 - PeerSpot reviewer
Big Data Engineer at BigTapp Analytics
Real User
A solution for orchestrating EMR clusters with plug-and-play UI
Pros and Cons
  • "Apache Airflow is easy to use and can monitor task execution easily. For instance, when performing setup tasks, you can conveniently view the logs without delving into the job details."
  • "Airflow should support the dynamic drag creation."

What is our primary use case?

I have used Apache Airflow for various purposes, such as orchestrating Spark jobs, EMR clusters, Glue jobs, and submitting jobs within the DCP data flow on Azure Databricks including ad hoc queries. For instance, if there's a need to execute queries on Redshift or other databases.

How has it helped my organization?

If you are working with APIs or databases, you must write SQL queries and formulate the right statements to retrieve everything. But with the UI, it's more like plug-and-play. You go there, select the task you want to see, like logs, and click on it. It will promptly display the details of the logs, automatically showing the returned logs. However, if you're accessing logs manually from the web server, you must write commands and perform additional tasks. These overheads can be efficiently managed using the UI.

What is most valuable?

Apache Airflow is easy to use and can monitor task execution easily. For instance, when performing setup tasks, you can conveniently view the logs without delving into the job details. All logs are readily accessible within the interface itself. Examining the logs lets you discern which steps and processes are being executed.

You don't have to configure SMTP for everything. You need to configure email settings, such as email on error, failure, or alert access. With Apache Airflow, you can send emails with just a few lines of code. You don't have to write extensive code to configure SMTP; all those configurations can be accomplished within a few lines of code.

I managed a complex workflow for a finance application project. They use Apache Airflow to orchestrate processes, such as retrieving data from SFTP and landing it into S3. From S3, they trigger Glue jobs based on certain conditions. Additionally, they use the Glue catalog in Glusoft for data management, all orchestrated using Airflow. Furthermore, various logics are written in Airflow DAGs to handle scenarios like security mismatches. For instance, files are sent accordingly if there's a missing security.

Apache Airflow triggers a set of tasks based on DAGs. If you have multiple tags, such as raw, transform, and ready layers, instead of manually triggering each DAGs. In that case, you can integrate them to trigger one, automatically triggering the others. Also, you can put conditions.

What needs improvement?

Airflow should support the dynamic drag creation.

For how long have I used the solution?

I have been using Apache Airflow for over 8 years.

What do I think about the stability of the solution?

The solution is stable. 

I rate the solution's stability a nine-point five out of ten.

What do I think about the scalability of the solution?

We were using Apache Airflow on Kubernetes. As more requests came in, it scaled dynamically based on the available ports. There are almost 15 data engineers who are using Apache Airflow.

I rate the solution's scalability a nine out of ten.

How was the initial setup?

The initial setup is straightforward. It will be tricky if you go with an executor or Kubernetes operator.

If you're into plug-and-play convenience, Apache Airflow supports various deployment methods like Docker, Helm, or Kubernetes. If you want to spin up Airflow, it will take more than 10-15 minutes. However, if you're making customizations or prefer not to use existing databases, the setup time could be extended due to customization requests.

What other advice do I have?

You use Apache Airflow to automate your data pipelines. When you have a data pipeline, such as a Spark job or any other job, and want to automate it, triggering the job manually is not always necessary. You need to configure these DAGs accordingly. For instance, Airflow can initiate the job when the data becomes available. We don't need to keep the cluster running all the time, 24/7. We start the cluster using Airflow when we need to submit the job. Once the job is completed, we terminate the cluster.

I recommend the solution.

Overall, I rate the solution a nine out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Buyer's Guide
Apache Airflow
April 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,847 professionals have used our research since 2012.
Senior Data Engineer at a photography company with 11-50 employees
Real User
Top 5
A tool that needs to improve its complex initial setup and limited integration capabilities but can be useful in workflow automation
Pros and Cons
  • "Apache Airflow is useful for workflow automation, making it capable of automating pipelines, data pipelines, and data warehouse processes."
  • "The problem with Apache Airflow is that it is an open-source tool. You have to build it into a Kubernetes container, which is not easy to maintain, and I find it to be very clunky."

What is our primary use case?

Apache Airflow is useful for workflow automation, making it capable of automating pipelines, data pipelines, and data warehouse processes. I don't have a strong need for Apache Airflow because I do everything with a dbt or data build tool since it has its own integrated workflow process.

I use Fivetran to synchronize my data. I don't need to do any automation on that and don't have any need for workflow automation. I have everything I need.

How has it helped my organization?

We were experimenting with the solution. We never reached the point where we would deploy the solution in the production capacity.

What needs improvement?

The problem with Apache Airflow is that it is an open-source tool. You have to build it into a Kubernetes container, which is not easy to maintain, and I find it to be very clunky.

Additionally, there is room for improvement with DAGs. I had a very hard time building DAGs in Apache Airflow. I decided to use Astronomer, which is on top of Apache Airflow and is supposed to make your life easier. The best part of the solution is the third-party add-on which is Astronomer.

It would be a very nice tool if it could have been an entirely cloud-based solution. Apache Airflow is not so nice when you have a hybrid setup, such as half is on-premises and half of it is on a cloud environment. It should integrate better with the outside world.

For how long have I used the solution?

I have been using Apache Airflow for a couple of months.

What do I think about the stability of the solution?

I have no opinion on the solution's stability. The solution did not get to a production capacity. I couldn't even do file processing with Apache Airflow. None of the engineers could actually help me set up Apache Airflow. I had to give up on the product. Just buy a product that works, and you will be done with it.

How was the initial setup?

The initial setup was complex to deploy on the cloud. Installing the software is very difficult. The documentation is very bad. There is no installer where you can press a button, and it does everything for you. One may need a couple of engineers to install the solution, which is an issue with open-source tools. Price-wise, the software falls on the cheaper side. With Apache Airflow, one may spend much more on engineers.

The solution is deployed purely on the cloud.

What was our ROI?

I didn't experience any ROI using the solution. I could do everything without Apache Airflow since it would have been just a money pit.

What other advice do I have?

I suggest others not use Apache Airflow. If you use Apache Airflow, you will waste your time unless you have a bunch of engineers who already know about the solution.

If you cannot write a DAG within two hours of starting the process, then forget about the tool, and it would be better if you tried to find something else.

Overall, if the tool was working properly, it would be very good, but unfortunately, it is not.

Overall, I rate the solution a five out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Pravin Gadekar - PeerSpot reviewer
Google Cloud Architect at Capgemini
Real User
Top 10
Has an efficient user interface, but its stability needs improvement
Pros and Cons
  • "The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. c"
  • "The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues."

What is our primary use case?

We use the product to orchestrate data engines and process new data files.

What is most valuable?

The product's most valuable feature is scalability. It helps us run hundreds of data jobs every day.

What needs improvement?

The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues. It requires manual intervention to resume jobs. Additionally, while extending the code is possible, it sometimes necessitates creating custom plugins.

For how long have I used the solution?

We have been using Apache Airflow for four years.

What do I think about the scalability of the solution?

We have more than 100 Apache Airflow users in our organization.

How was the initial setup?

The initial setup on Google Cloud using Cloud Composer is straightforward and simplified. However, deploying it on-premises can be complex and challenging.

What was our ROI?

The product is worth the investment.

What's my experience with pricing, setup cost, and licensing?

It is an open-source solution, so there are no hidden fees or licensing costs associated with the software. However, users need to cover the operational costs for the actual infrastructure, such as the virtual machines (VMs).

What other advice do I have?

The directed acyclic graph (DAG) functionality in Apache Airflow has significantly enhanced our workflow management. It provides a visual representation of data processing tasks.

The user interface for monitoring and managing workflows has been excellent, particularly in the latest version. It is difficult for beginners to use the platform, and some training is required.

I recommend the product to others, and it is much better than our competitors. It is an open source. I rate it a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Mikalai Surta - PeerSpot reviewer
Head of Big Data Department at IBA Group
MSP
Top 5
Used for the orchestration of data pipelines, but it should have better integration with cloud platforms
Pros and Cons
  • "Since it's widely adopted by the community, Apache Airflow is a user-friendly solution."
  • "Apache Airflow should have better integration with cloud platforms."

What is our primary use case?

We use Apache Airflow for the orchestration of data pipelines.

What is most valuable?

Since it's widely adopted by the community, Apache Airflow is a user-friendly solution.

What needs improvement?

Apache Airflow should have better integration with cloud platforms.

For how long have I used the solution?

I have been using Apache Airflow for a couple of years.

What do I think about the stability of the solution?

Apache Airflow is not a stable solution.

What do I think about the scalability of the solution?

Around ten people are using the solution in our organization.

How was the initial setup?

The solution's initial setup is difficult and should be done by an experienced person.

What's my experience with pricing, setup cost, and licensing?

Apache Airflow is a cheap solution.

What other advice do I have?

The solution is deployed on the cloud in our organization. Before choosing Apache Airflow, users should try cloud-native services first.

Overall, I rate the solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Associate Data Engineer at a outsourcing company with 201-500 employees
MSP
Top 5
Connects to everything we need, but doesn't support development through the UI
Pros and Cons
  • "Development on Apache Airflow is really fast, and it's easy to use with the newer updates. Everything is in Python, so it's not hard to understand. They also have a graphical view, so if you are not a programmer and you are just an administrator, you can easily track everything and see if everything is working or not."
  • "Programmatically, it's very good, and it doesn't have any competitors, but you cannot develop anything in Airflow UI. You need to develop everything within the program. In the market, other tools have come up recently as competitors to Airflow, and they also give graphical programming options, whereas Airflow doesn't provide that feature currently. All the DAGs you want to build need to be coded in Python."

What is our primary use case?

We were using Apache Airflow for our orchestration needs. We used it for all the jobs that we had created in Databricks, Fivetran, or dbt. These were the three primary tools that we were using. There were a few others, but these were the three primary tools. So, Apache Airflow was for the job orchestration and connecting them to each other for building our entire data pipeline. We were also using Apache Airflow for dbt CI/CD purposes.

What is most valuable?

The most valuable feature is that it's the most popular data orchestration tool in the market right now. It connects to everything you need.

It's open-source. You have a lot of documentation and a lot of people helping out. It has large communities, so if you need something or you want to ask something, you can. Often, someone else would have already asked that question, and they would have already got the answer, and you can just look it up.

Development on Apache Airflow is really fast, and it's easy to use with the newer updates. Everything is in Python, so it's not hard to understand. They also have a graphical view, so if you are not a programmer and you are just an administrator, you can easily track everything and see if everything is working or not. For notifications, it can connect with different messaging tools such as Slack and Teams, as well as with webhooks. It's very easy to use, and it has a lot of features that you would expect from any of the data orchestration tools.

What needs improvement?

Programmatically, it's very good, and it doesn't have any competitors, but you cannot develop anything in Airflow UI. You need to develop everything within the program. In the market, other tools have come up recently as competitors to Airflow, and they also give graphical programming options, whereas Airflow doesn't provide that feature currently. All the DAGs you want to build need to be coded in Python. It doesn't provide features for graphical programming. You cannot drag and drop something, build a pipeline out of that, or orchestrate that with a drag and drop. They have a graphical feature but only for administration purposes, not for development. They don't have a UI for development.

It doesn't support the Windows system. That's a big drawback because a lot of people are using Windows. 

For how long have I used the solution?

I used Apache Airflow on my previous project. We had planned to use it in our current project, but due to time issues, we were not able to deploy it. In my previous project, I used it for around eight or nine months.

What do I think about the stability of the solution?

It's a very stable product.

What do I think about the scalability of the solution?

It's highly scalable. You can scale it as much as you want. It depends on the size, and you need to scale up your instance. We had over 3,000 DAGs in our previous project, and we didn't face any issue with even 8 GB memory in our EC2 instance. If you have a lot of DAGs, you might need to scale up, but it's quite lightweight, so you don't need to worry much about that.

How are customer service and support?

It's open source. It was my first project, and I had a few doubts, but everything I needed was available on the internet, so I never had to contact their support. I might have been able to post my questions on their GitHub, but I didn't need that. Airflow has a very large community, so any questions you ask get answered there.

How was the initial setup?

Its setup wasn't done by us. It was done by the Astronomer team on Azure Community Services. So, it was deployed and set up on Azure Community Service. Everything was taken care of by the Astronomer team.

What about the implementation team?

Apache Airflow has two large and popular distributors. There might be others, but the two popular ones are Bitnami and Astronomer. For us, everything was set up by Astronomer.

What's my experience with pricing, setup cost, and licensing?

It's open source. You can install it locally on your own system. If you are deploying it in the production system, you normally deploy it on some cloud, such as EC2 service, which would have some cost. If you are setting up a Docker container or something for Apache Airflow yourself, which is quite easy, you can do pretty much everything online. I have set it up on my local system, and It doesn't take a long time. You can do customization for your project such as selecting different repository databases or selecting different cellular or web services, which is good.

If you are going with a service provider such as Astronomer or Bitnami, they will charge you because they are a distributor of Airflow. They have some of their own features and their own support. They will charge you if you are going with them.

What other advice do I have?

If you are on a Mac or Linux system, it's very easy to install. You can just go to the Apache website to install it, and you can start working, but Apache Airflow doesn't support Windows Exe installation, so if you have some knowledge of Docker containers for WSL, it'll be useful.

Other than that, Astronomer has an instructor called Marc Lamberti who is very popular in the Airflow community. He has YouTube videos. In five minutes, he can teach you how to set up Airflow or what DAGs are. He has five or six videos, and he gets into the details with his videos. So, if you have no idea about Apache Airflow and you don't want to go through all the documentation, you can start with those videos, but if you have a Mac or Linux system, you can directly install it on your system.

I'd rate it a seven out of ten because it doesn't support Windows, and it doesn't support graphical designing, so we cannot create DAGs in the UI. We can administer and look at DAGs through the UI, but we cannot create DAGs through the UI. Other orchestration tools that are available in the market provide that feature.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Nomena NY HOAVY - PeerSpot reviewer
Lead Data Scientist at MVola
Real User
Top 10
An easy to implement and flexible solution
Pros and Cons
  • "The solution is flexible for all programming languages for all frameworks."
  • "Apache Airflow could be improved by integrating some versioning principles."

What is our primary use case?

Currently, I am a lead data scientist. Our primary use cases for Apache Airflow are for all orchestrations, from the basic big data lake to machine learning predictions. It is used for all the MLS processes. It is also used for some ELT, to transform, load, and export all big data from restricted, unrestricted, and all phase processes.

What is most valuable?

The user experience of Apache Airflow is good. The solution is flexible for all programming languages for all frameworks. I also value that it is used for monitoring. Apache Airflow helps to easily integrate data sources with other products.

What needs improvement?

Apache Airflow could be improved by integrating some versioning principles. Currently, we have to swap some tags in our flow. It would be interesting if we can check the product and version all of the product at the same time comparing what scripts have changed from last year to this year, or last month to this month.

For example, we have a flow for one project, to version it we need to check it one by one to identify which tags changed and which scripts changed. All of these need to be done manually.

For how long have I used the solution?

I have been using Apache Airflow for four months.

What do I think about the stability of the solution?

We have experienced some bugs in Airflow. For example, the solution did not mention all the errors regarding why a process did not work. We had to investigate to try and understand why it was not working.

What do I think about the scalability of the solution?

The solution is easy to scale. We have four people in our organization that use Airflow. One is dedicated to the solution, while the others can use it to adjust the flow of their jobs on their own.

How are customer service and support?

We do not use technical support. We are trained to resolve concerns on our own. If a problem is significant we could call support, however, there is a good developer community that uses Airflow that can help resolve the issue with us.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

Prior to using Airflow, I used Windows SSIS for three years. We made the switch because Windows SSIS uses the drag-and-drop concept, where Airflow requires coding. Also, Windows is orientated to Microsoft products and is not very flexible.

How was the initial setup?

I am a technician, so the initial setup is instinctive. Without experience, it would not be as simple. Experience with configurations with parameters is required. The documentation is good, however, it does not mention some features explicitly requiring some research. 

I would rate the ease of implementation a three out of five.

What about the implementation team?

We have dedicated machine learning ops, so we manage all product deployment ourselves. The deployment takes about four days, including two days of administration. 

Apache Airflow requires maintenance. It is very important to maintain all the source codes and all the data. We are looking for a platform that would facilitate the maintenance of the project.

What's my experience with pricing, setup cost, and licensing?

We use a community edition of Apache Airflow. It is open-source and free. 

What other advice do I have?

Anyone considering Apache Airflow should make sure that they have a good team with experience, including some administration. A strong background will help to understand and exploit the strengths of the platform.

I would rate this solution a nine out of 10 overall.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Lead of Monitoring Tech at a educational organization with 1,001-5,000 employees
Real User
Top 20
A good tool for managing data pipelines
Pros and Cons
  • "Since Apache works very well on Python, we can manage everything and create pipelines there."
  • "Adding more automated components in Apache Airflow for basic things like exporting the data would be helpful."

What is our primary use case?

We use Apache Airflow to send our data to a third-party system.

What is most valuable?

We are already on Python. Since Apache works very well on Python, we can manage everything and create pipelines there.

What needs improvement?

Adding more automated components in Apache Airflow for basic things like exporting the data would be helpful. Apache Airflow is not that easy to use, but we have gotten used to it.

For how long have I used the solution?

I have been using Apache Airflow for three years.

What do I think about the stability of the solution?

Apache Airflow is a stable solution.

What do I think about the scalability of the solution?

Apache Airflow is not a scalable solution for our use cases. We have a very huge list of use cases. Over 10 developers use Apache Airflow in our organization.

How are customer service and support?

Apache Airflow's technical support team is good and provides assistance almost 90% of the time.

How was the initial setup?

Apache Airflow's initial setup is easy. It's not that difficult, but it has a learning curve.

What's my experience with pricing, setup cost, and licensing?

Apache Airflow is a cheap solution.

What other advice do I have?

Depending on your use case, if you are looking for a quick solution to work on and know Python, you should go ahead with Apache Airflow.

Apache Airflow is a good enough tool for managing data pipelines. However, the solution is not up to the mark as you scale up and go at the higher performance. Apache Airflow has introduced the DAG connector for managing data pipelines.

Overall, I rate Apache Airflow an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.