We performed a comparison between Azure Data Factory and erwin Data Catalog by Quest based on real PeerSpot user reviews.Find out what your peers are saying about Microsoft, Informatica, Oracle and others in Data Integration Tools.
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"In StreamSets, everything is in one place."
"StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
"Azure Data Factory became more user-friendly when data-flows were introduced."
"I think it makes it very easy to understand what data flow is and so on. You can leverage the user interface to do the different data flows, and it's great. I like it a lot."
"An excellent tool for pipeline orchestration."
"The data mapping and the ability to systematically derive data are nice features. It worked really well for the solution we had. It is visual, and it did the transformation as we wanted."
"Its integrability with the rest of the activities on Azure is most valuable."
"The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring."
"The most valuable features of Azure Data Factory are the flexibility, ability to move data at scale, and the integrations with different Azure components."
"The most valuable feature of Azure Data Factory is the core features that help you through the whole Azure pipeline or value chain."
"When you combine it with data lineage, every time you need to make a change, it allows you to do impact analysis on any changes and then connect to the end-users or data stewards so that they can be aware that a change is coming. That's one of the main benefits we use it for."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"Data Factory's monitorability could be better."
"For some of the data, there were some issues with data mapping. Some of the error messages were a little bit foggy. There could be more of a quick start guide or some inline examples. The documentation could be better."
"There is always room to improve. There should be good examples of use that, of course, customers aren't always willing to share. It is Catch-22. It would help the user base if everybody had really good examples of deployments that worked, but when you ask people to put out their good deployments, which also includes me, you usually got, "No, I'm not going to do that." They don't have enough good examples. Microsoft probably just needs to pay one of their partners to build 20 or 30 examples of functional Data Factories and then share them as a user base."
"Snowflake connectivity was recently added and if the vendor provided some videos on how to create data then that would be helpful."
"There should be a way that it can do switches, so if at any point in time I want to do some hybrid mode of making any data collections or ingestions, I can just click on a button."
"User-friendliness and user effectiveness are unquestionably important, and it may be a good option here to improve the user experience. However, I believe that more and more sophisticated monitoring would be beneficial."
"Azure Data Factory should be cheaper to move data to a data center abroad for calamities in case of disasters."
"The need to work more on developing out-of-the-box connectors for other products like Oracle, AWS, and others."
"There are always ways to improve things. For example, we can use AI to be able to find out something. When we are typing something, if we don't know the exact term, Artificial Intelligence would be useful to find terms that are phonetically or syntactically similar. Instead of having to type in the exact name, they can provide those in the list. So, they can provide AI support for the search because when you have thousands and thousands of terms, it is hard to remember all the names."
StreamSets offers an end-to-end data integration platform to build, run, monitor and manage smart data pipelines that deliver continuous data for DataOps, and power the modern data ecosystem and hybrid integration.
Only StreamSets provides a single design experience for all design patterns for 10x greater developer productivity; smart data pipelines that are resilient to change for 80% less breakages; and a single pane of glass for managing and monitoring all pipelines across hybrid and cloud architectures to eliminate blind spots and control gaps.
With StreamSets, you can deliver the continuous data that drives the connected enterprise.
Azure Data Factory is a managed cloud service built for extract-transform-load (ETL), extract-load-transform (ELT), and data integration projects. This is a digital integration tool as well as a cloud data warehouse that allows users to create, schedule, and manage data in the cloud or on premises. The use cases of the product include data engineering, operational data integration, analytics, ingesting data into data warehouses, and migrating on-premise SQL Server Integration Services (SSIS) packages to Azure.
The tool allows users to create data-driven workflows for initiating data movement and data transformation at scale. Data can be ingested from disparate data stores via pipelines. Companies can utilize this product to build complex ETL processes for transforming data visually with data flows. Azure Data Factory also offers services such as Azure HDInsight Hadoop, Azure Databricks, Azure Synapse Analytics, and Azure SQL Database. These services are created to facilitate data management and control for organizations, providing them with better visibility of their data for improved decision-making.
Azure Data Factory allows companies to create schedules for moving and transforming data into their pipelines. This can be done hourly, daily, weekly, or according to the specific needs of the organization. The steps through which the data-driven workflows work in Azure Data Factory are the following:
1. Connecting to required sources and collecting data. After connecting to the various sources where data is stored, the pipelines move the data to a centralized location for further processing.
2. Transforming and enriching the data. Once the data is moved to a centralized data store in the cloud, the pipelines transform it through services like HDInsight Hadoop, Azure Data Lake Analytics, Spark, and Machine Learning.
3. Delivering the transformed data to on-premise sources or keeping it in cloud storage sources for usage by different tools and applications.
Azure Data Factory Concepts
The solution consists of a series of interconnected systems that provide data integration and related services for users. The following concepts create the end product for users:
Azure Data Factory Benefits
Azure Data Factory offers clients many several benefits. Some of these include:
Reviews from Real Users
According to Dan M., a Chief Strategist & CTO at a consultancy, Azure Data Factory is secure and reasonably priced.
A Senior Manager at a tech services company evaluates the tool as reasonably priced, scales well, good performance.
erwin Data Catalog (DC), part of the erwin Data Intelligence Suite, automates enterprise metadata management, including data mapping, code generation, data profiling, data lineage and impact analysis. It integrates and activates data in a single, unified catalog in accordance with business requirements by scheduling ongoing scans of metadata from the widest array of data sources, keeping metadata current with full versioning and change management, and easily mapping data elements from source to target, including data at rest and in motion, and harmonize data integration across platforms.
Azure Data Factory is ranked 1st in Data Integration Tools with 40 reviews while erwin Data Catalog by Quest is ranked 10th in Metadata Management with 1 review. Azure Data Factory is rated 7.8, while erwin Data Catalog by Quest is rated 8.0. The top reviewer of Azure Data Factory writes "There's the good, the bad and the ugly....unfortunately lots of ugly". On the other hand, the top reviewer of erwin Data Catalog by Quest writes "Helps with metadata management, saves time, and allows us to do impact analysis on any changes". Azure Data Factory is most compared with Informatica PowerCenter, Microsoft Azure Synapse Analytics, Informatica Cloud Data Integration, Alteryx Designer and Talend Open Studio, whereas erwin Data Catalog by Quest is most compared with WhereScape RED, Talend Open Studio, Alation Data Catalog, Informatica Enterprise Data Catalog and SAP Information Steward.
We monitor all Data Integration Tools reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.