What is our primary use case?
We use this solution to ingest data from one of the source systems from SAP. From the SAP HANA view, we push data to our data pond and ingest it into our data warehouse.
How has it helped my organization?
Azure Data Factory didn't bring a lot of good when we were also using Alteryx. Alteryx is user-friendly, while Azure Data Factory uses many resources and has issues with parallel workflows. Alteryx helps you diagnose issues quicker than Azure Data Factory because it's on the cloud and has a cold start debugger.
Azure Data Factory has to wake up whenever you are trying to do testing, and it takes about four to five minutes. It's not always online to do a quick test. For example, if we want to test an Excel file to see if the formatting is correct or why the data-flow or pipeline is failing, we need to wait four to five minutes to get the cold start debugger to run. Compared to Alteryx, Azure Data Factory could be better. Nevertheless, we are using it because we have to.
What is most valuable?
Initially, when we started using it, we didn't like it because it needed to be more mature and had data-flows, so we used the traditional pipeline. After that, Azure Data Factory introduced the concept of data-flows, and it started to become more mature and look more like Alteryx. Azure Data Factory became more user-friendly when data-flows were introduced.
What needs improvement?
They introduced the concept of Flowlets, but it has bugs. Flowlets are a reusable component that allows you to create data-flows. We can configure a Flowlet as a reusable pipeline and plug it inside different data-flows, so we don't have to rewrite our code or visual transformation.
If we make any changes in our data-flow, it reverts all our changes to the original state of the Flowlet. It does not retain changes, and we must reconfigure the Flowlets repeatedly. We had these issues three months ago so things might have changed. It works fine whenever we plug it in and configure it in our data-flow, but if we make minor changes to it, the Flowlet needs to be reconfigured again and loses the configuration.
For how long have I used the solution?
We have used this solution for about a month and a half. It is a cloud-based tool, so there are no versions. It is all deployed on Azure Cloud.
What do I think about the stability of the solution?
Everything is computed inside the SQL server if we're working with pipelines, so we have to be very careful when designing our solution in Azure Data Factory. Alteryx spoiled us because we never cared how it looked in the backend because all the operations were happening on the Alteryx server. But in Azure Data Factory, they run on the capacity of our data warehouse. So Azure Data Factory cannot run your queries, and it directly sends the query to the instance in the SQL server or data warehouse. So we have to be very careful about how we perform certain operations.
We need to have knowledge of SQL and how to optimize our queries. If we are calling a stored procedure, it joins one table in Alteryx. It is pretty easy, and we just put a joint tool. Suppose we want to do it with a stored procedure in the Azure Data Factory. In that case, we have to be very careful about how we write our code. So that is a challenge for our team because we were not looking into how to optimize their SQL queries when fighting queries from Azure Data Factory to the data warehouse.
In addition, the workflows were running very slow, the performance was bad, and some queries were getting timed out because we have a threshold. So we faced many challenges and had to reeducate ourselves on SQL and query optimization.
What do I think about the scalability of the solution?
In regards to scaling, when Azure Data Factory was introduced as your Databricks, it worked similarly to Hadoop or Spark, and it had some Spark clusters in the back end that could scale it as much as it could, and speed up the performance. So it is scalable, especially with Databricks, because a lot of data-related transformations can be performed.
On my team, there are approximately 20 people who work with Azure Data Factory.
How are customer service and support?
We do not have experience with customer service and support.
How was the initial setup?
It does not require any installation and is more like software as a service. You need to create an instance of Azure Data Factory in Azure and configure some of the connections to your databases. You can connect to your block storages and some authentication is necessary for Azure Data Factory.
The setup is straightforward. It doesn't take much time, and it's on cloud. It requires a few clicks, and you can quickly set it up and grant access to the developer. Then the developer can go to the link and start developing within their browser.
We have a team that takes care of the cloud infrastructure, so we raise a ticket and request infrastructure, and they just exceed it based on the naming convention with the project name.
What about the implementation team?
We have an entire team that takes care of the cloud infrastructure. So we raise a ticket when we need infrastructure, which is executed based on the naming convention for the project name.
What was our ROI?
The nature of our solution is not based on ROI because we are building solutions for other functions within the same organization. In addition, due to the large size of our organization and the services we provide, the ROI is not something we consistently track. It's something discussed with the management, so I can't comment on it.
What's my experience with pricing, setup cost, and licensing?
The cost is based on usage and the computing resources consumed. However, since Azure Data Factory connects with so many different functionalities that Azure provides, such as Azure functions, Logic apps and others in the Azure Data Factory pipelines, additional costs can be acquired by using other tools.
Which other solutions did I evaluate?
We did not evaluate other options because this solution was aligned with out current work environment.
What other advice do I have?
I rate the solution a seven out of ten. The solution is good and constantly improving, but the concept of Flowlets can be reconfigured to retain the changes we make. I advise users considering this solution to thoroughly understand what Azure Data Factory is and evaluate what's available in the market. Secondly, to assess the nature of the use cases and the kind of products they will be building before deciding to choose a solution.
*Disclosure: I am a real user, and this review is based on my own experience and opinions.