"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"In StreamSets, everything is in one place."
"It is really easy to set up and the interface is easy to use."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
"Its arrays are powerful enough to handle migrations even when the replication is happening in the background, without causing any trouble with the ongoing traffic."
"It is not like a traditional ETL, but it gives quite a lot of flexibility."
"The compare feature is the most valuable piece of it."
"The most valuable features of IBM Cloud Pak for Data are the Watson Studio, where we can initiate more groups and write code. Additionally, Watson Machine Learning is available with many other services, such as APIs which you can plug the machine learning models."
"One of Cloud Pak's best features is the Watson Knowledge Catalog, which helps you implement data governance."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"HVR Software's technical support could be improved. Whenever we log a case, the response that we get from the support is a bit delayed."
"It should have a few more monitoring functionalities."
"The documentation can be laid out better to make it easier to find things, and I really wish there was built-in support for changing passwords. Some features don't work as advertised for the platform/repository database, and HVR is not always the fastest at getting results."
"One thing that bugs me is how much infrastructure Cloud Pak requires for the initial deployment. It doesn't allow you to start small. The smallest permitted deployment is too big. It's a huge problem that prevents us from implementing the solution in many scenarios."
"There is a solution that is part of IBM Cloud Pak for Data called Watson OpenScale. It is used to monitor the deployed models for the quality and fairness of the results. This is one area that needs a lot of improvement."
HVR Software is ranked 24th in Data Integration Tools with 3 reviews while IBM Cloud Pak for Data is ranked 29th in Data Integration Tools with 2 reviews. HVR Software is rated 8.6, while IBM Cloud Pak for Data is rated 7.6. The top reviewer of HVR Software writes "Good stability and scalability, easy setup, and valuable compare feature". On the other hand, the top reviewer of IBM Cloud Pak for Data writes "Plenty of features, multiple services available, but machine learning could improve". HVR Software is most compared with Oracle GoldenGate, Qlik Replicate, AWS Database Migration Service, Matillion ETL and Informatica Cloud Data Integration, whereas IBM Cloud Pak for Data is most compared with Azure Data Factory, IBM InfoSphere DataStage, Palantir Foundry, Denodo and Informatica PowerCenter. See our HVR Software vs. IBM Cloud Pak for Data report.
See our list of best Data Integration Tools vendors.
We monitor all Data Integration Tools reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.