Apache Airflow OverviewUNIXBusinessApplication

Apache Airflow is the #8 ranked solution in BPM Software. PeerSpot users give Apache Airflow an average rating of 8.0 out of 10. Apache Airflow is most commonly compared to Camunda Platform: Apache Airflow vs Camunda Platform. Apache Airflow is popular among the large enterprise segment, accounting for 72% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a financial services firm, accounting for 18% of all views.
Apache Airflow Buyer's Guide

Download the Apache Airflow Buyer's Guide including reviews and more. Updated: November 2022

What is Apache Airflow?

Apache Airflow is an open-source workflow management system (WMS) that is primarily used to programmatically author, orchestrate, schedule, and monitor data pipelines as well as workflows. The solution makes it possible for you to manage your data pipelines by authoring workflows as directed acyclic graphs (DAGs) of tasks. By using Apache Airflow, you can orchestrate data pipelines over object stores and data warehouses, run workflows that are not data-related, and can also create and manage scripted data pipelines as code (Python). 

Apache Airflow Features

Apache Airflow has many valuable key features. Some of the most useful ones include:

  • Smart sensor: In Apache Airflow, tasks are executed sequentially. The smart sensors are executed in bundles, and therefore consume fewer resources.
  • Dockerfile: By using Apache Airflow’s dockerfile feature, you can run your business’s Airflow code without having to document and automate the process of running Airflow on a server. 
  • Scalability: Because Apache Airflow has a modular architecture and uses a message queue to orchestrate an arbitrary number of workers, you can easily scale it. 
  • Plug-and-play operators: With Apache Airflow, you can choose from several plug-and-play operators that are ready to execute your tasks on many third-party services.

Apache Airflow Benefits

There are many benefits to implementing Apache Airflow. Some of the biggest advantages the solution offers include:

  • User friendly: Using Apache Airflow requires minimal python knowledge to get started.
  • Intuitive user interface: The Apache Airflow user interface enables you to visualize pipelines running in production, monitor progress, and also troubleshoot issues when needed.
  • Easy integration: Apache Airflow can easily be integrated with cloud platforms (Google, AWS, Azure, etc).
  • Visual DAGs: Apache Airflow’s visual DAGs provide data lineage, which facilitates debugging of data flows and also aids in auditing and data governance. 
  • Flexibility: Apache Airflow provides you with several ways to make DAG objects more flexible. At runtime, a context variable is passed to each workflow execution, which is quickly incorporated into an SQL statement that includes the run ID, execution date, and last and next run times.
  • Multiple deployment options: With Apache Airflow, you have several options for deployment, including self-service, open source, or a managed service.
  • Several data source connections: Apache Airflow can connect to a variety of data sources, including APIs, databases, data warehouses, and more.  

Reviews from Real Users

Below are some reviews and helpful feedback written by PeerSpot users currently using the Apache Airflow solution.

A Senior Solutions Architect/Software Architect says, “The product integrates well with other pipelines and solutions. The ease of building different processes is very valuable to us. The difference between Kafka and Airflow, is that it's better for dealing with the specific flows that we want to do some transformation. It's very easy to create flows.”

An Assistant Manager at a comms service provider mentions, “The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot.”

A Senior Software Engineer at a pharma/biotech company comments that he likes Apache Airflow because it is “Feature rich, open-source, and good for building data pipelines.”

Apache Airflow was previously known as Airflow.

Apache Airflow Customers

Agari, WePay, Astronomer

Apache Airflow Video

Apache Airflow Pricing Advice

What users are saying about Apache Airflow pricing:
  • "Although Airflow is open source software, there's also commercial support for it by Astronomer. We personally don't use the commercial support, but it's always an option if you don't mind the extra cost."
  • "We use a community edition of Apache Airflow. It is open source and free."
  • "We are using the open-source version of Apache Airflow."
  • "The pricing for the product is reasonable."
  • Apache Airflow Reviews

    Filter by:
    Filter Reviews
    Industry
    Loading...
    Filter Unavailable
    Company Size
    Loading...
    Filter Unavailable
    Job Level
    Loading...
    Filter Unavailable
    Rating
    Loading...
    Filter Unavailable
    Considered
    Loading...
    Filter Unavailable
    Order by:
    Loading...
    • Date
    • Highest Rating
    • Lowest Rating
    • Review Length
    Search:
    Showingreviews based on the current filters. Reset all filters
    Senior Solutions Architect/ Software Architect at a comms service provider with 51-200 employees
    Real User
    Top 5
    Integrates well with other pipelines and builds different processes well but the scalability needs improvement
    Pros and Cons
    • "The product integrates well with other pipelines and solutions."
    • "The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not."

    What is our primary use case?

    We normally use the solution for creating a specific flow for data transformation. We have several pipelines that we use and due to the fact that they're pretty well-defined, we use it in conjunction with other tools that do the mediation portion. With Airflow, we do the processing of such data.

    What is most valuable?

    The product integrates well with other pipelines and solutions.

    The ease of building different processes is very valuable to us. The difference between Kafka and Airflow, is that it's better for dealing with the specific flows that we want to do some transformation. It's very easy to create flows. 

    What needs improvement?

    The graphics in the past have not been ideal.

    We have several areas where we feel they could improve in terms of being a little bit more flexible. One is implementation. Even though we customized it, there were some specific things we had to do with the image by itself.

    The management integration was challenging as well. It requires a lot of work on our end. We were creating our own way to integrate things specifically with specific tools. There's not really an ease of management out-of-the-box option for integration. We needed to become a little bit creative to solve that ourselves.

    The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not.

    There is no SDC versioning. There's no virtual control for pipelines. We have to build several pipelines for several flows, yet there's not a virtual control to generate them.

    There's no Python SDK. We need to generate our own scripts and upload them and put them there. However, there's not a realistic case that we can get connected to them. On top of that, the API sets that are provided are very limited. They are not as rich as others. You cannot do much with them.

    For how long have I used the solution?

    I've been using the solution for maybe three years at this point. It hasn't been too long.

    Buyer's Guide
    Apache Airflow
    November 2022
    Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
    654,658 professionals have used our research since 2012.

    What do I think about the stability of the solution?

    The solution is largely stable. Obviously when you start creating more use cases, then you realize the limitations, however, it's not really, really bad.

    What do I think about the scalability of the solution?

    Due to the fact that the solution is on the cloud, we thought it would be fairly easy to scale. This is proving not to be the case and scalability is limited.

    The challenging part is to make it really flexible in a cloud-native environment. With other applications, what you have there is the scalability that can be sensitive to your needs, based on the amount of data you are putting into the flow.

    Instead of you having to create your own logic to scale it up, it should be a little more efficient on how it gets integrated into the whole environment. You have to get a little bit creative and put some commands and some logic in there and be monitoring everything. You build everything - versus other options that are more out of the box. With other solutions, if you have these bursts of data they ultimately can scale up and they are more native.

    How are customer service and support?

    Technical support has been pretty good. We don't really have anything to complain about. We're satisfied with the service so far.

    Which solution did I use previously and why did I switch?

    For this particular category, due to the fact that we're testing all the other tools and they were too much of what we needed and due to the fact that we have used other products in other projects, and nothing really worked for us. Airflow, being a bit different, we decided that it was a nice player and a good open-source tool. 

    We do use other tools. However, this one seems to work quite well for us.

    How was the initial setup?

    The initial setup isn't as straightforward as we hoped. It's not as flexible as other options. You need to be a bit creative during the process.

    What's my experience with pricing, setup cost, and licensing?

    This product is open-source.

    What other advice do I have?

    We're just customers and end-users. We don't have a special business relationship with Apache.

    I'm not sure of which version of the solution we're using. It's likely the most up-to-date, or at the very most back two or three versions as we are not using any of the older versions.

    I'd advise others considering the solution to first understand what exactly you're trying to achieve. You either select a non-cloud native Apache workflow manager or select something that is way too big for what you are actually trying to achieve. Understand what is exactly what you need and the volumes that you need, and what exactly are the use cases.

    After that, in terms of deployment, that depends on what you exactly are trying to do. If all of your solutions are cloud-native, try to do it with a cloud-native tools solution. Specifically, go to the CMCS site and look into the solutions that there. Those have been tested at least for the cloud-native solutions that exist.

    Then, just make sure that the components you have will match and will be available to whatever you're trying to build. For example, the user management is something that is important for us and for this specific setup. Probably for some others, it's not going to be. 

    Take into consideration, what are the different connection points and make sure that they are either supported or that you can support the integration of such items. You need to have a proper developer that can help you build your connector or your API.

    In general, I would rate the solution at a seven out of ten. If they fix the APIs and the price on LTK, I'd rate it closer to a nine.

    Which deployment model are you using for this solution?

    Public Cloud
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Assistant Manager at a comms service provider with 10,001+ employees
    Real User
    Top 10
    Comes with direct support for Python, letting us easily automate our pipelines
    Pros and Cons
    • "The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot."
    • "We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult."

    What is our primary use case?

    There are a few use cases we have for Apache Airflow, one being government projects where we perform data operations on a monthly basis. For example, we'll collect data from various agencies, harmonize the data, and then produce a dashboard. In general, it's a BI use case, but focusing on social economy.

    We concentrate mainly on BI, and because my team members have strong technical backgrounds we often fall back to using open source tools like Airflow and our own coded solutions. 

    For a single project, we will typically have three of us working on Airflow at a time. This includes two data engineers and a system administrator. Our infrastructure model is hybrid, based both in the cloud and on-premises. 

    What is most valuable?

    The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot.

    It's such a natural fit because our engineers are also Python-based, and I think we also quite like that we don't have to learn different kinds of UIs. Airflow is based on standard software packages, so we don't have to learn anything new in the way of opinionated UIs from different vendors.

    What needs improvement?

    We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult.

    When something fails, it's not that easy to troubleshoot what went wrong. Sometimes the UI becomes really slow and there's no easy way to diagnose the problem. For the most part, we have had to learn through trial and error how to operate it properly. 

    The UI is also not that attractive, and I feel that the user experience isn't that nice. Version 2 is supposedly better, but without having tried it, I could suggest more improvements in the visual UI. We want to do the ETL as code, but having a nice visual UI to facilitate this process would be great. Because that means we can also rely on non-technical staff, rather than just the three solid technical staff we have here. If there were better features for the UI, like drag-and-drop, then we could expand its use to more of our team.

    For how long have I used the solution?

    I've been using Apache Airflow for about two and a half years. 

    What do I think about the stability of the solution?

    I think how Apache Airflow works is great. We like the paradigm of ETL as code, which means you define your pipeline as code. All the while, people talk about infrastructure as code, so the practice of ETL as code really fits into that philosophy.

    What do I think about the scalability of the solution?

    We can scale it well, and it runs on cloud, too. It's compatible with cloud-native technologies like Kubernetes so it has no issues regarding elasticity.

    How are customer service and technical support?

    We contacted an Airflow developer for assistance once and it was a good experience.

    Which solution did I use previously and why did I switch?

    We like to explore different tools, mixing and matching them to our needs, but we have never really found any like Airflow that are to our liking. We tried looking into Talend and Alteryx but we didn't find them suitable to our style or approach.

    How was the initial setup?

    As a first-time user, it was complex and somewhat difficult to set up as there are many components to put together. You've got your data portion, your scheduler portion, your web server portion, etc., and you've got all these parts to set up at first.

    The next project that you get to, it gets easier. You really need to acquire a feel for what you're doing, and once you get over that, it's not too bad.

    What about the implementation team?

    We implemented Airflow ourselves, with the help of our two in-house data engineers and system administrator. It took around three months to get it deployed initially, from concept into production. Then after that, the goal is just to operate it and keep it running.

    What's my experience with pricing, setup cost, and licensing?

    Although Airflow is open source software, there's also commercial support for it by Astronomer. We personally don't use the commercial support, but it's always an option if you don't mind the extra cost.

    What other advice do I have?

    I can recommend Apache Airflow, especially if there are serious data engineers on your team. If, on the other hand, you're looking to enable business users, then it's not suitable.

    I would rate Apache Airflow an eight out of ten.

    Which deployment model are you using for this solution?

    Hybrid Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Other
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Apache Airflow
    November 2022
    Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
    654,658 professionals have used our research since 2012.
    Nomena NY HOAVY - PeerSpot reviewer
    Lead Data Scientist at MVola
    Real User
    Top 10
    An easy to implement and flexible solution
    Pros and Cons
    • "The solution is flexible for all programming languages for all frameworks."
    • "Apache Airflow could be improved by integrating some versioning principles."

    What is our primary use case?

    Currently, I am a lead data scientist. Our primary use cases for Apache Airflow are for all orchestrations, from the basic big data lake to machine learning predictions. It is used for all the MLS processes. It is also used for some ELT, to transform, load, and export all big data from restricted, unrestricted, and all phase processes.

    What is most valuable?

    The user experience of Apache Airflow is good. The solution is flexible for all programming languages for all frameworks. I also value that it is used for monitoring. Apache Airflow helps to easily integrate data sources with other products.

    What needs improvement?

    Apache Airflow could be improved by integrating some versioning principles. Currently, we have to swap some tags in our flow. It would be interesting if we can check the product and version all of the product at the same time comparing what scripts have changed from last year to this year, or last month to this month.

    For example, we have a flow for one project, to version it we need to check it one by one to identify which tags changed and which scripts changed. All of these need to be done manually.

    For how long have I used the solution?

    I have been using Apache Airflow for four months.

    What do I think about the stability of the solution?

    We have experienced some bugs in Airflow. For example, the solution did not mention all the errors regarding why a process did not work. We had to investigate to try and understand why it was not working.

    What do I think about the scalability of the solution?

    The solution is easy to scale. We have four people in our organization that use Airflow. One is dedicated to the solution, while the others can use it to adjust the flow of their jobs on their own.

    How are customer service and support?

    We do not use technical support. We are trained to resolve concerns on our own. If a problem is significant we could call support, however, there is a good developer community that uses Airflow that can help resolve the issue with us.

    How would you rate customer service and support?

    Positive

    Which solution did I use previously and why did I switch?

    Prior to using Airflow, I used Windows SSIS for three years. We made the switch because Windows SSIS uses the drag-and-drop concept, where Airflow requires coding. Also, Windows is orientated to Microsoft products and is not very flexible.

    How was the initial setup?

    I am a technician, so the initial setup is instinctive. Without experience, it would not be as simple. Experience with configurations with parameters is required. The documentation is good, however, it does not mention some features explicitly requiring some research. 

    I would rate the ease of implementation a three out of five.

    What about the implementation team?

    We have dedicated machine learning ops, so we manage all product deployment ourselves. The deployment takes about four days, including two days of administration. 

    Apache Airflow requires maintenance. It is very important to maintain all the source codes and all the data. We are looking for a platform that would facilitate the maintenance of the project.

    What's my experience with pricing, setup cost, and licensing?

    We use a community edition of Apache Airflow. It is open-source and free. 

    What other advice do I have?

    Anyone considering Apache Airflow should make sure that they have a good team with experience, including some administration. A strong background will help to understand and exploit the strengths of the platform.

    I would rate this solution a nine out of 10 overall.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Senior Software Engineer at a pharma/biotech company with 1,001-5,000 employees
    Real User
    Top 10
    Feature rich, open-source, and good for building data pipelines
    Pros and Cons
    • "I like the UI rework, it's much easier."
    • "I would like to see it more friendly for other use cases."

    What is our primary use case?

    I'm a data engineer. In the past, I used Airflow for building data pipelines and to populate data warehouses. With my current company, it's a data product or datasets that we sell to biopharma companies.

    We are using those pipelines to generate those datasets.

    What is most valuable?

    I like the UI rework, it's much easier.

    I use XCom for derived variables that need to pass between tasks. I don't really tend to use it for passing data, but only for a derived variable. For example, I don't have to re-query something every time, with one-task uses. I use the JSON comp for overwriting certain parameters.

    In our use cases, some of the inputs of the dataset are files that we pulled out of S3. Sometimes they need to re-do those files, but we don't need to change any logic, we just need to redo the bills. Rather than redeploying the code to point to a new S3 bucket, we overwrite it to point to a different S3 key.

    I have read that there are many different workflow pipelining tools in the biotech space, such as Snakemake and Nextflow.

    There is also a CWL plugin that we may look into at some point. 

    Eventually, we might have a use case where a researcher has a pipeline they run locally, and then we want to convert that to a DAG. 

    The CWL-Airflow plugin would be useful for that. This might be something to look into later. But that would be like months, or maybe a year from now.

    What needs improvement?

    I am using a Celery Executor and I find that it crashes and I can't see any logs. I can only assume that it's a memory issue and have to blindly restart until eventually, it starts up again.

    One of the use cases is triggered by input rather than a batch process. For example, we receive a batch of data, it goes through tasks one, two, and three, and a new batch comes in, each subsequent task should be operating on just that data from the prior task.

    I am used to working on it as the output gets written to a table and then the next task selects all from that upstream table. It could be coded where you are only writing the data for that portion of the task. It could handle state machines and state changes as opposed to the batch proxy.

    I would like to see it more friendly for other use cases.

    For how long have I used the solution?

    In my current company, I just introduced it within the last couple of months. But I've used it at my prior two jobs as well.

    We are using Version 2.0.1.

    What's my experience with pricing, setup cost, and licensing?

    We are using the open-source version of Apache Airflow.

    What other advice do I have?

    I usually create my own custom operators every time. We upgraded to 2.0, but I am not using any of the new features. 

    I haven't yet used DAG of DAGs or the new way of using Python functions in the Python operator yet. But we might use DAG of DAGs eventually.

    I Love this solution and I would rate it a nine out of ten.

    Which deployment model are you using for this solution?

    Private Cloud

    If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

    Amazon Web Services (AWS)
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Mahendra Prajapati - PeerSpot reviewer
    Senior Data Analytics at a media company with 1,001-5,000 employees
    Real User
    Top 5Leaderboard
    A customizable solution, but the integration process could be simplified
    Pros and Cons
    • "The best feature is the customization."
    • "The solution could be improved by simplifying the integration process."

    What is our primary use case?

    Our primary use case for this solution is scheduling task rates. We capture the data from the SQL Server location and migrate it to the central data warehouse.

    What is most valuable?

    The best feature is the customization that can be done using Python. For example, there are use cases where we have to tweak the algorithm and with Apache Script Rate, we have extra functionality that helps to change the underlying process. We can define our algorithms and processes using Python.

    What needs improvement?

    The solution could be improved by simplifying the integration process and providing access to its support team to guide integration.

    For how long have I used the solution?

    We have been using this solution for two months and it is deployed on-premises.

    What do I think about the stability of the solution?

    The solution is stable but primarily depends on the support team and how they manage it.

    What do I think about the scalability of the solution?

    Apache Airflow is scalable. Approximately 20 people use this solution on my team.

    How are customer service and support?

    We haven't had any experience with customer service and support.

    Which solution did I use previously and why did I switch?

    Previously, we were using SQL server integration tools and integration service SSIS packages. We had project orders and wanted to migrate everything as it was an open source rate and no license was required. We switched to Apache Flow because we are trying to migrate all the projects developed in SSIS using Python.

    How was the initial setup?

    The initial setup was straightforward. However, if a script is written, it takes four to five minutes to set up.

    What's my experience with pricing, setup cost, and licensing?

    Apache Airflow is open source, so I cannot comment on licensing costs.

    Which other solutions did I evaluate?

    We chose this solution because it was suitable for our business needs.

    What other advice do I have?

    I rate this solution a seven out of ten. My advice to new users is to have good proficiency with Python language. The solution is good but can be improved by simplifying its integration process.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    Flag as inappropriate
    PeerSpot user
    Associate Director - Technologies at a tech services company with 51-200 employees
    Real User
    Top 10
    Quick and easy to set up, but the technical support needs to be improved

    What is our primary use case?

    Our primary use case is to integrate with SLAs.

    What is most valuable?

    The most valuable feature is the workflow.

    What needs improvement?

    Technical support is an area that needs improvement. The contact numbers should be readily available so that we can call to get support as required.

    In the future, I would like to see a single-click installation.

    For how long have I used the solution?

    We have been working with Apache Airflow for approximately one month.

    What do I think about the scalability of the solution?

    In our company, we are doing a POC and there are only three users. We have also implemented it for clients.

    We do plan to increase our usage and the POC that we are now working on is something that we will implement for other clients if it works.

    How are customer service and technical support?

    We are not satisfied with technical support. We rely on using Google to identify solutions for the problems we have.

    Which solution did I use previously and why did I switch?

    We did not use another similar solution prior to Airflow.

    How was the initial setup?

    The initial setup was straightforward and it does not take long to complete. The deployment took no more than an hour.

    Which other solutions did I evaluate?

    We evaluated Control-M and another similar product from IBM.

    What other advice do I have?

    This is a good product and I definitely recommend it.

    I would rate this solution a seven out of ten.

    Which deployment model are you using for this solution?

    On-premises
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees
    Real User
    Top 10
    Scalable, stable and simple installation
    Pros and Cons
    • "We have been quite satisfied with the stability of the solution."
    • "The dashboard is connected into the BPM flow that could be improved."

    What is our primary use case?

    We mainly used the solution in banking, finance, and insurance. We are looking for some opportunities in production companies, but this is only at the very early stages.

    What is most valuable?

    I do not have specific feedback because it is quite early in the review stage for comment.

    What needs improvement?

    The dashboard is connected into the BPM flow that could be improved.

    For how long have I used the solution?

    I have been using the solution for half a year.

    What do I think about the stability of the solution?

    We have been quite satisfied with the stability of the solution.

    What do I think about the scalability of the solution?

    The scalability of the solution is good.

    How are customer service and technical support?

    We had no issue with technical support.

    How was the initial setup?

    The installation is straightforward.

    What's my experience with pricing, setup cost, and licensing?

    The pricing for the product is reasonable.

    Which other solutions did I evaluate?

    We are evaluating Camunda as well as this solution. We are investigating and trying to determine how suitable they are for production facilities. Additionally, we are seeing where the solutions are actually suitable in what type of processes.

    What other advice do I have?


    We are unsure of which solution we will end up with, we are testing them currently. We are trying to get into new business types and new industries. We are looking into how well the solutions can be used in production facilities.

    I rate Apache Airflow an eight out of ten.

    Which deployment model are you using for this solution?

    Public Cloud
    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Ariful Mondal - PeerSpot reviewer
    Consulting Practice Partner - Data, Analytics & AI at FH
    Real User
    ExpertModerator
    Managing large scale Data Pipeline and Python tasks have been made easy

    We have been using Apache Airflow for the past 2 years for various use cases such as: 

    • Data Pipeline building and monitoring
    • Automation of data extraction processes and Intelligent Automation
    • Web Scraping at scale for financial services 

    We manage large-scale data processing workloads using DAG (Directed Acyclic Graph), which is a core concept of Airflow (Apache Airflow is commonly known as Airflow) expediting error handling and logging. It helped us to manage the complex workflows and orchestration of tasks efficiently.

    I found the following features very useful:

    • DAG - Workload management and orchestration of tasks using 
    • TaskFlow API - moving Python tasks have been made easy, cleaning of DAGs using @task decorator in python
    • Connection and Hooks - interface to connect external systems

    To be able to implement various useful functionalities of Airflow effectively you would need to be a very good python programmer. UI can be improved with additional user-friendly features for non-programmers and for fewer coding practitioner requirements.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.
    PeerSpot user
    Buyer's Guide
    Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.
    Updated: November 2022
    Buyer's Guide
    Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.