Coming October 25: PeerSpot Awards will be announced! Learn more

Azure Data Factory OverviewUNIXBusinessApplication

Azure Data Factory is #1 ranked solution in top Data Integration Tools and #2 ranked solution in top Cloud Data Warehouse tools. PeerSpot users give Azure Data Factory an average rating of 8.0 out of 10. Azure Data Factory is most commonly compared to Informatica PowerCenter: Azure Data Factory vs Informatica PowerCenter. Azure Data Factory is popular among the large enterprise segment, accounting for 71% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a computer software company, accounting for 20% of all views.
Azure Data Factory Buyer's Guide

Download the Azure Data Factory Buyer's Guide including reviews and more. Updated: October 2022

What is Azure Data Factory?

Create, schedule, and manage your data integration at scale with Azure Data Factory - a hybrid data integration (ETL) service. Work with data wherever it lives, in the cloud or on-premises, with enterprise-grade security.

Azure Data Factory Customers

Milliman, Pier 1 Imports, Rockwell Automation, Ziosk, Real Madrid

Azure Data Factory Video

Azure Data Factory Pricing Advice

What users are saying about Azure Data Factory pricing:
"Pricing is comparable, it's somewhere in the middle."

Azure Data Factory Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
GaryM - PeerSpot reviewer
Data Architect at World Vision
Real User
Top 5Leaderboard
There's the good, the bad and the ugly....unfortunately lots of ugly
Pros and Cons
  • "The trigger scheduling options are decently robust."
  • "There is no built-in pipeline exit activity when encountering an error."

What is our primary use case?

Current use is for extracting data from Google Analytics into Azure Sql db as a source to our EDW.  Extracting from GA was a problematic with SSIS.  

The larger use case is to assess the viability of the tool for larger use in our organization as a replacement for SSIS for our EDW and also as an orchestration agent to replace SQLAgent for firing SSIS packages using Azure SSIS-IR.

The initial rollout was to solve the immediate problem while assessing its ability to be used for other purposes within the organization. And also establish the development and administration pipeline process.  

How has it helped my organization?

ADF allowed us to extract Google Analytics data (via Bigquery) without purchasing an adapter.  

It has also helped with establishing how our team can operate within Azure using both PaaS and IaaS resources and how those can interact.  Rolling out a small data factory has forced us to understand more about all of Azure and how ADF needs to rely upon and interact with other Azure resources.

It is providing a learning ground for use of DevOps Git along with managing ARM templates as well as driving the need to establish best practices for CI.  

What is most valuable?

The most valuable aspect has been a large list of no-cost source/target adapters.

It is also providing a PaaS ELT solution that integrates with other Azure resources. 

Its graphical UI is very good and is even now improving significantly with the latest preview feature of displaying inner activities within other activities such as ForEach and If condition.   

Its built-in monitoring and ability to see each activity's JSON inputs/outputs provides an excellent audit trail.

The trigger scheduling options are decently robust.

The fact that it's continually evolving is hopeful that even if some feature is missing today, it may be soon resolved. For example, it lacked support for simple SQL activity until earlier this year, when that was resolved.  

The Copy Activity Upsert option did not function when I first started using the tool but now seems to function quite well.  

It is built to be metadata driven to do large numbers of patterned ETL processes similar to what BIML provides for SSIS but much simpler to use than BIML. 

What needs improvement?

The list of issues and gaps in this tool is extensive, although as time goes non gets shorter.  It currently includes:

1) Missing email/SMTP activity.

2) Mapping data flows requires significant lag time to spin up spark clusters.

3) Performance compared to SSIS. Expect copy activity to take ten times that of what SSIS takes for simple data flow between tables in the same database.

4) It's missing a debug of a single activity.  The workaround is setting a breakpoint on the task and doing a "rerun from activity". 

5) Oath2.0 adapters lack automated support for refresh tokens.

6) Copy activity errors provide no guidance as to which column is causing a failure.

7) There's no built-in pipeline exit activity when encountering an error.

8) AutoResolveIntegration runtime should never pick a region that you're not using (should be your default for your tenant). 

9) Resolve IR queue time lag.  For example a small table copy activity I just ran took 95 seconds queuing and 12 seconds to actually copy the data. 

They need to fix the bugs, for example:

1) Debug sometimes stops picking up saved changes for a period of time, rendering this essential tool useless during that time.

2) Enable interactive authoring (a critical tool for development) often doesn't turn on when enabled without going into another part of the tool to enable it.  And then you have to wait several minutes before it's enabled which is time your blocked from development until it's ready.  And then it only activates for up to 120 minutes before you have to go through this all over again.  I think Microsoft is trying to torture developers.  

3) Exiting the inside of an activity that contains other activities always causes the screen to jump to the beginning of a pipeline requiring re-navigating to where you were at (greatly slowing development productivity). 

4) AutoResolveIntegration runtime (using default settings) often picks remote regions to operate, which causes either an unnecessary slowdown or an error message saying it's unable to transfer the volume of data across regions.

5) Copy activity often gets error "mapping source is empty" for no apparent reason. If you play with the activity such as importing new metadata then it's happy again.  This sort of thing makes you want to just change careers. Or tools. 

Buyer's Guide
Azure Data Factory
October 2022
Learn what your peers think about Azure Data Factory. Get advice and tips from experienced pros sharing their opinions. Updated: October 2022.
635,987 professionals have used our research since 2012.

For how long have I used the solution?

I have been using this product for 6 months.

What do I think about the stability of the solution?

Production operation seems to run reliably so far however the development environment seems very buggy where something works one day and not the next. 

What do I think about the scalability of the solution?

So far, the performance of this solution is abysmal compared to SSIS. Especially with small tasks such as copying activity from one table to another within the same database. 

How are customer service and support?

Non-existent.  Logged multiple issues only to hear back from 1st level support weeks later asking questions and providing no help other than wasting my time. In one situation it was a bug where the debug function stopped working for a couple of days.  By the time they got back to me the problem went away. 

How would you rate customer service and support?

Negative

Which solution did I use previously and why did I switch?

We have been and still rely on SSIS for our ETL. ADF seems to do ELT well but I would not consider it for use in ETL at this time.  Its mapping data flows are too slow (which is a large understatement) to be of practical use to us. Also, the ARM template situation is impractical for hundreds of pipelines like we would have if we converted all our SSIS packages into pipelines as a single ADF couldn't take on all our pipelines. 

How was the initial setup?

Initial setup is the largest caveat for this tool.  Once you've organized your Azure environment and setup Devops pipelines, the rest is a breeze.  But this is NOT a trivial step if you're the first one to establish use of ADF at your organization or within your subscription(s).  Instead of learning just an ETL tool you have to get familiar and establish best practices for the entire Azure and Devops technologies.  That's a lot to take on just to get some data movements operational. 

What about the implementation team?

I did this in-house with the assistance from another team who has been using Devops with Azure for other purposes (non ADF use). 

What's my experience with pricing, setup cost, and licensing?

The setup cost is only the time it takes to organize Azure resources so you can operate effectively and figure out how to manage different environments (dev/test/sit/uat/prod, etc.). Also, how to enable multiple developers to work on a single data factory without losing changes or conflicting other changes.

Which other solutions did I evaluate?

We operate only with SSIS today, and it works very well for us. However, looking toward the future, we will need to eventually find a PaaS solution that will have longer sustainability.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Richard Domikis - PeerSpot reviewer
Chief Technology Officer at cornerstone defense
Real User
Top 5
Easy to bring in outside capabilities, flexible, and works well
Pros and Cons
  • "It is very modular. It works well. We've used Data Factory and then made calls to libraries outside of Data Factory to do things that it wasn't optimized to do, and it worked really well. It is obviously proprietary in regards to Microsoft created it, but it is pretty easy and direct to bring in outside capabilities into Data Factory."
  • "There is always room to improve. There should be good examples of use that, of course, customers aren't always willing to share. It is Catch-22. It would help the user base if everybody had really good examples of deployments that worked, but when you ask people to put out their good deployments, which also includes me, you usually got, "No, I'm not going to do that." They don't have enough good examples. Microsoft probably just needs to pay one of their partners to build 20 or 30 examples of functional Data Factories and then share them as a user base."

What is our primary use case?

Our customers use it for data analytics on a large volume of data. So, they're basically bringing data in from multiple sources, and they are doing ETL extraction, transformation, and loading. Then they do initial analytics, populate a data lake, and after that, they take the data from the data lake into more on-premise complex analytics.

Its version depends on a customer's environment. Sometimes, we use the latest version, and sometimes, we use the previous versions.

What is most valuable?

It is very modular. It works well. We've used Data Factory and then made calls to libraries outside of Data Factory to do things that it wasn't optimized to do, and it worked really well. It is obviously proprietary in regards to Microsoft created it, but it is pretty easy and direct to bring in outside capabilities into Data Factory.

It is very flexible. You can build any features you want.

What needs improvement?

There is always room to improve. There should be good examples of use that, of course, customers aren't always willing to share. It is Catch-22. It would help the user base if everybody had really good examples of deployments that worked, but when you ask people to put out their good deployments, which also includes me, you usually got, "No, I'm not going to do that." They don't have enough good examples. Microsoft probably just needs to pay one of their partners to build 20 or 30 examples of functional Data Factories and then share them as a user base.

For how long have I used the solution?

I have been using this solution for the last five years, but probably, the last three years have been significant.

What do I think about the stability of the solution?

It has been stable. I have not experienced any issues.

What do I think about the scalability of the solution?

It is decent for most things. I'm not sure if it is necessarily intended for large volume and high-speed streams of data. By large, I mean really big, but for pretty much anything that most users would want to do, including ourselves, it is fine. Our clients are large government organizations.

It scales fine within its environment. You can literally throw another Data Factory in or replicate one and do things pretty quickly. So, it is not at all hard to increase your processing footprint, but you have to pay for it. It doesn't end up being quite expensive. Although I haven't really done it, I would suspect that if I did the equivalent in AWS, Azure would be more expensive than AWS because of the way they price data.

How are customer service and technical support?

They're all right. I would rate them a seven out of 10. They do fine, but there is a lot that they don't do.

I'm not sure if even Microsoft has enough SMEs from a user point of view. They are helpful for getting it set up, making it work, and helping you figure out why it doesn't work. If you want to ask them about something that you are trying to do, they'll try to direct you to a partner, which is fine, but the partners also don't necessarily have an experience. It is Catch-22. There aren't a lot of people out there with Azure experience because Azure started to be in demand only over the last two years.

Which solution did I use previously and why did I switch?

The customer used a lot of homebrew stuff. They were doing a lot of internal stuff and some Oracle stuff. They were doing things, and they made a workaround and said, "Okay, we'll bring it into Oracle Database, and then we'll do all these things to it." We're like, "Okay, that works, but then you're taking it out of that database and putting it over into the data lake. I don't understand why are you doing that?" That's what they were doing.

How was the initial setup?

It is pretty straightforward. Devil is in the details, but you can easily get up and running in a day with Data Factory. Anybody who is comfortable in Azure can set up Data Factory, but it takes experience to know what it can and can't do or should and shouldn't do.

What other advice do I have?

It is proven, and it works. Make sure you have a well-defined use case and build a quick prototype to ensure that it, in fact, does what you need. Give yourself some benchmarks. That's exactly what we did. We defined the use case, and then we set up Data Factory. We found a couple of things that it didn't do. We figured out a way to work around those things and have it do those things. After that, we confirmed it. It is operational, and it is doing its job. It has been pretty much error-free since then.

It would become easier to use as more people become Azure-capable. If I want to find an AWS SME, I can get tons. They're expensive, but I have them. If I want to find an Azure SME, I usually have to create them. Azure was later to market than AWS. So, there are fewer people who are experts in Azure, and they are in high demand.

I would rate Azure Data Factory a nine out of 10. They just don't have enough good examples out there of things.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Buyer's Guide
Azure Data Factory
October 2022
Learn what your peers think about Azure Data Factory. Get advice and tips from experienced pros sharing their opinions. Updated: October 2022.
635,987 professionals have used our research since 2012.
Brian Sullivan - PeerSpot reviewer
Chief Analytics Officer at Idiro Analytics
Real User
Top 20
I like that we can set up the security protocols for IP addresses
Pros and Cons
  • "It's cloud-based, allowing multiple users to easily access the solution from the office or remote locations. I like that we can set up the security protocols for IP addresses, like allow lists. It's a pretty user-friendly product as well. The interface and build environment where you create pipelines are easy to use. It's straightforward to manage the digital transformation pipelines we build."
  • "Data Factory has so many features that it can be a little difficult or confusing to find some settings and configurations. I'm sure there's a way to make it a little easier to navigate."

What is our primary use case?

We use Data Factory for automating ETL processes, data management, digital transformation, and scheduled automated processes. My team has about 11 people, and at least five use Data Factory. It's mostly data engineers and analysts. 

Each data analyst and engineer manages a few projects for clients. Typically, it's one person per client, but we might have two or three people managing and building out pipelines for a larger project.  

What is most valuable?

It's cloud-based, allowing multiple users to easily access the solution from the office or remote locations. I like that we can set up the security protocols for IP addresses, like allow lists. It's a pretty user-friendly product as well. The interface and build environment where you create pipelines are easy to use. It's straightforward to manage the digital transformation pipelines we build.

What needs improvement?

Data Factory has so many features that it can be a little difficult or confusing to find some settings and configurations. I'm sure there's a way to make it a little easier to navigate.

In the main ADF web portal could, there's a section for monitoring jobs that are currently running so you can see if recent jobs have failed. There's an app for working with Azure in general where you can look at some segs in your account. It would be nice if Azure had an app that lets you access the monitoring layer of Data Factory from your phone or a tablet, so you could do a quick check-in on the status of certain jobs. That could be useful.

For how long have I used the solution?

We've been using Azure Data Factory for about three years.

What do I think about the stability of the solution?

I've been happy with it overall. I don't think we've had any major issues. We've been able to do what we needed, whether connecting to different data sources or setting up different types of transformations and processes. 

What do I think about the scalability of the solution?

It's a cloud solution, so it's inherently scalable. I don't know If we have to raise the limits on resources like clusters and processing power or if it will just automatically scale up. I can't remember offhand. 

Which solution did I use previously and why did I switch?

We managed the same actions with a combination of tools. We used SFTP servers to move data from one place to another. We used scripts for loading and some other stored procedures or processes for data transformation within a database. It took two or three pieces of technology or systems to manage the same types of operation. Data Factory lets us consolidate those steps into a single pipeline. 

How was the initial setup?

Setting up Azure Data Factory is pretty straightforward. We had an Azure account already, and Data Factory was just something we could add as an extra service. We had to create instances and pipelines, and it took us about two weeks to get our first pipelines scheduled and running. 

What about the implementation team?

We do everything in-house.

What was our ROI?

We see a return on Data Factory if we compare the time and effort that would be necessary to perform the equivalent processes manually. 

What's my experience with pricing, setup cost, and licensing?

I'm not too familiar with the cost, but I believe we're reasonably happy with what we're paying. My understanding is that the cost of Data Factory is tied to consumption. It depends on the amount of data or the number of pipelines running, and the cost varies from month to month depending on the usage. 

You'll obviously pay more if you're scheduling heavy digital transformation processes to run every hour, but I don't think there are any other hidden costs or anything extra. When you set up a new account, you have a trial period that enables you to create a test pipeline or process that's typical of your use case and then do a benchmark test to see if Data Factory can achieve the efficiency you need. You'll also get some idea of how much the process will cost to run. From there, it's straightforward to do a cost evaluation or comparison to see if it's the right fit for your company. 

Which other solutions did I evaluate?

We were looking for a single solution, and Data Factory was the first one that interested us. I don't think we looked at many others. We were pretty set on Azure, and Data Factory seemed to fit our needs, so we didn't make a full comparison with the alternatives.

What other advice do I have?

I rate Azure Data Factory nine out of 10. It isn't perfect, but it's solid. Data Factory has improved how we deal with various aspects of Azure. It has always met our needs in terms of the transformations and jobs we want to create and schedule. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Flag as inappropriate
PeerSpot user
Dan_McCormick - PeerSpot reviewer
Chief Strategist & CTO at a consultancy with 11-50 employees
Real User
Top 10
Secure and reasonably priced, but documentation could be improved and visibility is lacking
Pros and Cons
  • "The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring."
  • "They require more detailed error reporting, data normalization tools, easier connectivity to other services, more data services, and greater compatibility with other commonly used schemas."

What is our primary use case?

We use Azure Data Factory for data transformation, normalization, bulk uploads, data stores, and other ETL-related tasks.

How has it helped my organization?

Azure Data Factory allows us to create data analytic stores in a secure manner, run machine learning on our data, and easily adapt to changing schema.

What is most valuable?

The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring.

What needs improvement?

The documentation could be improved. They require more detailed error reporting, data normalization tools, easier connectivity to other services, more data services, and greater compatibility with other commonly used schemas.

I would like to see a better understanding of other common schemas, as well as a simplification of some of the more complex data normalization and standardization issues.

It would be helpful to have visibility, or better debugging, and see parts of the process as they cycle through, to get a better sense of what is and isn't working.

It's essentially just a black box. There is some monitoring that can be done, but when something goes wrong, even simple fixes are difficult to troubleshoot.

For how long have I used the solution?

I have been working with Azure Data Factory for a couple of years.

There is only one version.

What do I think about the stability of the solution?

Overall, I believe the stability has been good, but there have been a couple of occasions when Microsoft's resources needed to be allocated were overburdened, and we had to wait for unacceptable amounts of time to get our slot. It has now happened twice which is not ideal.

What do I think about the scalability of the solution?

There is no limit to scalability.

We only have a few users. One is a data scientist, and the other is a data analyst.

We use it to push up various dashboards and reports, it's a transitional product for transferring, transforming, and transitioning data.

It is extensively used, and we intend to expand our use.

How are customer service and support?

You don't really get that kind of support; it's more about documentation and the community support that is available. I would rate it a three out of five compared to others.

You could call them, and pay for their consulting hours directly, but for the most part, we try to figure it out or look through documentation. 

I think their documentation is lagging because it's not as popular of a tool, there's just not a lot, or as much to fall back on.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We had only our own tools, and we switched because you get to leverage all of the work done in a SaaS or platform as a service, or however they classify it. As a result, you get more functionality, faster, for less money.

How was the initial setup?

The initial setup is straightforward.

It is a working tool. You can start using it within an hour and then make changes as needed.

We only need one person to maintain the solution; it doesn't take much to keep it running.

It's not a problem; it's a platform.

What about the implementation team?

We completed the deployment ourselves.

What was our ROI?

We have seen a return on investment. I can't really share many details, but for us, this becomes something that we sell back to our clients.

What's my experience with pricing, setup cost, and licensing?

You pay based on your workload. Depending on how much data you process through it, the cost could range from a few hundred dollars to tens of thousands of dollars.

Pricing is comparable, it's somewhere in the middle.

There are no additional fees to the standard licensing fee.

Which other solutions did I evaluate?

We looked at some other tools, such as Databricks, AmazonGlue, and MuleSoft.

We already had most of our infrastructure connected to Azure in some way. So the integration of where our data resided appeared to be simpler and safer.

What other advice do I have?

I believe it would be beneficial if they could find someone experienced in some of the tools that are a part of this, such as Spark, not necessarily Data Factory specifically, but some of those other tools that will be very familiar and have a very quick time for productivity. If you're used to doing things in a different way, it may take some time because there isn't as much documentation and community support as there is for some more popular tools.

I would rate Azure Data Factory a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Senior Manager at a tech services company with 51-200 employees
Real User
Top 20
Reasonably priced, scales well, good performance
Pros and Cons
  • "The solution can scale very easily."
  • "My only problem is the seamless connectivity with various other databases, for example, SAP."

What is our primary use case?

My primary use case is getting data from the sensors.

The sensors are installed on the various equipment across the plant, and this sensor gives us a huge amount of data. Some are captured on a millisecond basis.

What we are able to do is the data into Azure Data Factory, and it has allowed us to scale up well. We are able to utilize that data for our predictive maintenance of the assets of the equipment, as well as the prediction of the breakdown. Specifically, we use the data to look at predictions for future possible breakdowns. At least, that is what we are looking to build towards.

How has it helped my organization?

It has helped us to take care of a lot of our analytics requirements. We are running a few analytics models on Data Factory, which is very helpful.

What is most valuable?

The overall architecture has been very valuable to us. It has allowed us to scale up pretty rapidly. That's something that has been very good for us. 

The solution can scale very easily.

The stability is very good and has improved very much over time.

What needs improvement?

My only problem is the seamless connectivity with various other databases, for example, SAP. Our transaction data there, all the maintenance data, is maintained in SAP. That seamless connectivity is not there. 

Basically, it could have some specific APIs that allow it to connect to the traditional ERP systems. That'll make it more powerful. With Oracle, it's pretty good at this already. However, when it comes to SAP, SAP has its native applications, which are the way it is written. It's very much AWS with SAP Cloud, so when it comes to Azure, it's difficult to fetch data from SAP.

The initial setup is a bit complex. It's likely a company may need to enlist assistance.

Technical support is lacking in terms of responsiveness.

For how long have I used the solution?

We've been using the solution roughly for about a year and a half.

It hasn't been an extremely long amount of time. 

What do I think about the stability of the solution?

From a security perspective, the product has come up a long way.

With the Azure Cloud Platform, in 2015, I was in a different organization and it was not reliable at all. It has become much more reliable since then and is very stable at the moment. It's reliable.

What do I think about the scalability of the solution?

The solution is pretty easy to scale on Azure. I have found it to be very efficient and it is pretty fast. You just need to get the order done properly, and then you will be able to scale up.

We have about five to seven people using it at this time.

How are customer service and technical support?

Technical support isn't the best, as it's a bit delayed at times.

Whenever we need some urgent support, wherein we have to restart or something has stuck, it takes a bit of time. Some improvements can be made in the customer support area.

In summary, we are not completely satisfied with the support.

How was the initial setup?

The initial setup is not straightforward. It's a bit complex. A company may need to hire someone to assist them with the process.

The solution's deployment took about eight weeks.

What about the implementation team?

I had to hire technical experts who could help us in the process. We could not handle the implementation ourselves.

What's my experience with pricing, setup cost, and licensing?

Cost-wise, it is quite affordable. It's not a factor in the decision-making process when it comes to whether or not we should use it. That said, the pricing is very reasonable.

Which other solutions did I evaluate?

We evaluated both Oracle and SAP before choosing Azure Data Factory.

What other advice do I have?

We are customers and end-users.

I'd advise companies considering the solution that they need to be very clear about the use case they are trying to address. They need to understand the data ecosystem that they have and what percentage of data is coming in from the various ERP systems.

Do that study properly and then come up with the right solution. If, for example, it is that the underlying data that they want to analyze is more than 60% residing in SAP, then probably Azure would not be the right platform to move ahead with.

We're mostly satisfied with the product. However, getting it connected to closed ERP systems like SAP would make it more powerful.

I would rate the solution eight out of ten.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Kamlesh Sancheti - PeerSpot reviewer
Director at a tech services company with 1-10 employees
Real User
Top 10
Comprehensive and user-friendly
Pros and Cons
  • "Azure Data Factory's most valuable features are the packages and the data transformation that it allows us to do, which is more drag and drop, or a visual interface. So, that eases the entire process."
  • "We are too early into the entire cycle for us to really comment on what problems we face. We're mostly using it for transformations, like ETL tasks. I think we are comfortable with the facts or the facts setting. But for other parts, it is too early to comment on."

What is our primary use case?

Azure Data Factory is for data transformation and data loading. It works from your transaction systems, and we are using it for our HRMS, Human Resource Capital Management System. It picks up all the transactional data pick and moves into the Azure Data Warehouse. From there, we would like to create reports in terms of our financial positions and our resource utilization project. These are the reports that we need to build onto the warehouse.

The purpose of Azure Data Factory is more about transformations, so it doesn't need to have a good dashboard. But, it has a feeding user interface for us to do our activities and debug actions. I think that's good enough.

What is most valuable?

Azure Data Factory's most valuable features are the packages and the data transformation that it allows us to do, which is more drag and drop, or a visual interface. So, that eases the entire process.

Azure Data Factory setup is quite user-friendly.

I am happy with the interface.

What needs improvement?

We are too early into the entire cycle for us to really comment on what problems we face. We're mostly using it for transformations, like ETL tasks. I think we are comfortable with the facts or the facts setting. But for other parts, it is too early to comment on.

We are still in the development phase, testing it on a very small set of data, maybe then the neatest four or bigger set of data. Then, you might get some pain points once we put it in place and run it. That's when it will be more effective for me to answer that.

For how long have I used the solution?

We are building Azure Data Factory right now internally to extract data from our transactional systems and put them into the warehouse so that the reporting engine can be built too.

What do I think about the scalability of the solution?

We have not tried it scaling up. But, Azure promises the stability and scalability should not be an issue.

From a development perspective, I think there were four developers who use Azure Data Factory. From a warehouse perspective, once we roll out the reports out, it should be used by at least 40 or 50 people minimum.

How are customer service and technical support?

Generally, the documentation is pretty decent. All the issues that come up are here in the documentation part. We've not really had to go to Microsoft as of now from a support perspective. The documentation and the support that we get over the internet is quite good.

How was the initial setup?

The initial setup was very straightforward.

The initial setup was quite quick, nothing much to do. Now, we are more developing the use cases. A use case with data generally takes around four or five days a use case because it will start right from identifying the right field, getting the data, transforming it, and finalizing the warehouse structure. That makes a bit of a thing, but it's pretty straightforward.

What about the implementation team?

We are a technical team so we implemented it in-house.

What's my experience with pricing, setup cost, and licensing?

It's a pay-as-you-go module. I'm not very sure about cost because our usage currently is very low. But, I feel that if the usage extends beyond a certain threshold, it will start getting expensive.

It depends what the threshold is. I see we're not at that threshold right now, so it's pretty decent right now.

Which other solutions did I evaluate?

We were looking at certain other projects and products. For example, we were looking at Snowflake that has a data warehouse. But the project wasn't working. That's why we selected Azure. The primary reason is the skills are very easily available for Azure. The second is from our strategy perspective, because we were trying to be a Microsoft shop it fits into our strategy. That's all.

What other advice do I have?

If you're a Microsoft shop, if you want to get there easily, I think Azure is one of the better choices. Otherwise, other tools generally require specialized skills and specialized partners to come and implement it. Once implemented, then it becomes much easier to install.

I can't comment right now. I've not talked to it in that fashion. Whatever was required by us, business users have been satisfied in the Data Factory setup.

On a scale of one to ten, I would give Azure Data Factory an eight.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Data Analytics Specialist at a pharma/biotech company with 10,001+ employees
Real User
Top 5Leaderboard
Quick delivery due to drag-and-drop interface
Pros and Cons
  • "One of the most valuable features of Azure Data Factory is the drag-and-drop interface. This helps with workflow management because we can just drag any tables or data sources we need. Because of how easy it is to drag and drop, we can deliver things very quickly. It's more customizable through visual effect."
  • "Data Factory could be improved by eliminating the need for a physical data area. We have to extract data using Data Factory, then create a staging database for it with Azure SQL, which is very, very expensive. Another improvement would be lowering the licensing cost."

What is our primary use case?

My primary use case of Azure Data Factory is supporting the data migration for advanced analytics projects. 

What is most valuable?

One of the most valuable features of Azure Data Factory is the drag-and-drop interface. This helps with workflow management because we can just drag any tables or data sources we need. Because of how easy it is to drag and drop, we can deliver things very quickly. It's more customizable through visual effect. 

What needs improvement?

Data Factory could be improved by eliminating the need for a physical data area. We have to extract data using Data Factory, then create a staging database for it with Azure SQL, which is very, very expensive. Another improvement would be lowering the licensing cost. 

For how long have I used the solution?

I have been using this solution for the past year. 

What do I think about the stability of the solution?

This solution is stable. We are using an Azure subscription, so there is no maintenance or direct updates, it's just always the latest version.

What do I think about the scalability of the solution?

This solution is automatically scalable, since it's in the cloud. At my company, there were more than one thousand people using this solution because we were a big, media-based company. If there are many user requests in the front end application and the system is not responding much or has slow performance, the system will automatically scale up the performance hardware requirements. 

How are customer service and support?

I have contacted technical support. I have never faced an issue like that with Denodo. Fortunately, we got some kind of a tutorial PDF, which helps us to deploy everything quickly. 

Which solution did I use previously and why did I switch?

Before working with Azure, I worked with Python. In the culture I was working in, there was no integration. We were using Pure Python scripting and Python data manipulation tools. For example, we used Python's pandas library, which we coded to transform and orchestrate the data, which is necessary for the endpoint. It was not at all a visual tool. It took more time than Denodo. 

How was the initial setup?

There is no installation because it's on the cloud. You just log on to the cloud with your subscription credentials, then you can use Data Factory directly. 

What about the implementation team?

I implemented through an in-house team. 

What's my experience with pricing, setup cost, and licensing?

Data Factory is very expensive. We are using an Azure subscription, so Data Factory has no direct updates, it's just always the latest version. Compared to Denodo, Azure is very costly. Azure Framework has multiple services, not only Data Factory. So in the cloud-based solution, if you're selecting a particular service, like Data Factory, you need to pay for each request.

Which other solutions did I evaluate?

I also use Denodo. Data Factory is like a transformation layer, but we need an additional staging database or a data storage facility, which is very expensive compared to implementing Denodo. So we extracted the data using Data Factory, then created a staging database with Azure SQL, which cost a huge amount since it's a physical data area. In Denodo, we just implement a layer, which is all handled in Denodo, and not a physical storage mechanism. I prefer customizable data solutions because they improve performance, creativity, and are helpful for front end people.

In comparison to Data Factory's drag-and-drop interface, Denodo developers need to create all the unified views by coding, so we have to create SQL queries to execute. With Data Factory, you can quickly drag and drop data or tables, but in Denodo, it takes more time because you need to code and test and all that.

What other advice do I have?

I rate Data Factory an eight out of ten, mainly because you need a staging database. I recommend Azure to others, but it depends on architecture. In Data Factory, there is no virtualization environment, no layer of virtualization to help integration and doing caching mechanisms. Though Data Factory is there, Denodo is going further. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Manoj Kukreja - PeerSpot reviewer
Technical Director, Senior Cloud Solutions Architect (Big Data Engineering & Data Science) at NorthBay Solutions
Real User
Top 20
Great for gathering data and pipeline orchestration; much improved monitoring feature
Pros and Cons
  • "An excellent tool for pipeline orchestration."
  • "The solution needs to be more connectable to its own services."

What is our primary use case?

We generally implement this product for data transformation for our clients. We create the pipelines and provide training before handing it over to them. We generally deal with large-scale organizations. I'm a senior solutions architect. 

How has it helped my organization?

I think the main benefit of this solution is the ease of use, especially for companies that have come from an SSIS type of background where they are used to Microsoft tools. 

What is most valuable?

If you have a very simple pipeline you can use Data Factory for transformations, but it's really for serious analytics. This is an excellent tool for pipeline orchestration; connecting the different components and activities as well as gathering data. It's an orchestration tool, not a transformation tool. The monitoring feature has drastically improved.

What needs improvement?

Data Factory is embedded in the new Synapse Analytics. The problem is if you're using the core Data Factory, you can't call a notebook within Synapse. It's possible to call Databricks from Data Factory, but not the Spark notebook and I don't understand the reason for that restriction. To my mind, the solution needs to be more connectable to its own services.

There is a list of features I'd like to see in the next release, most of them related to oversight and security. AWS has a lake builder, which basically enforces the whole oversight concept from the start of your pipeline but unfortunately Microsoft hasn't yet implemented a similar feature.

For how long have I used the solution?

I've been using this solution for five years. 

What do I think about the stability of the solution?

From what I've seen this is a stable solution. 

What do I think about the scalability of the solution?

The solution is easy to scale keeping in mind that Data Factory doesn't do any computations. We use it mainly to push the computations to Databricks or Synapse. Projects with our clients generally last a few months and only until they go into production. I believe the ability to increase is always there.

How are customer service and support?

We typically do not use customer support, but there were a few cases several years ago as the product was moving to the cloud that things were not so stable and we contacted support services - they were very good. 

Which solution did I use previously and why did I switch?

When I first started in this field, everything was basically Hadoop on-premise and Hadoop infrastructure. With the increase in cloud integrations, things have changed. Once the big data services got introduced, we were probably one of the few companies in North America that were actually into analytics and big data and we were the first to implement related Microsoft products in Canada.

How was the initial setup?

The initial setup is straightforward. I'm a huge fan and user of CI/CD pipelines and never do deployments manually. It's all automated and deployment takes a few minutes.

What's my experience with pricing, setup cost, and licensing?

Licensing costs of Data Factory are reasonable. The cost is mainly on the Synapse and Databricks side of things because they are the tools where the computations are done and where you need more nodes and servers.

What other advice do I have?

It's important to study the solution before purchasing it. The problem in this market is that because most users are generally not very knowledgeable, they typically fall for services that are not compatible with their use case. Data Factory comes with all the transformations but that doesn't work for serious analytics customers who generally need to resort to Databricks or Synapse which involves training and education. Since it's a new field and everything has just blasted off, it's very hard for people to catch on.

In my opinion, Airflow still ranks as number one but I would rate Data Factory an eight out of 10. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Azure Data Factory Report and get advice and tips from experienced pros sharing their opinions.
Updated: October 2022
Buyer's Guide
Download our free Azure Data Factory Report and get advice and tips from experienced pros sharing their opinions.