We performed a comparison between Actian Pervasive Data Integrator [EOL], Informatica PowerCenter, and StreamSets based on real PeerSpot user reviews.
Find out what your peers are saying about Microsoft, Informatica, Oracle and others in Data Integration."There were no concerns with the stability. This product is very good from a stability perspective."
"We can scale the product."
"It is an excellent ETL tool."
"The most valuable feature is the new Data Lake feature, which provides the basic capabilities needed."
"Has a good visual tool for data mapping."
"If the systems get migrated or upgraded, the amount of resources required are very minimal. We can change the connections and establish a new connection. It's very helpful."
"Complex transformations can be easily achieved by using PowerCenter. The processing layer does transformations and other things. About 80% of my transformations can be achieved by using the middle layer. For the remaining 15% to 20% transformations, I can go in and create stored procedures in the respective databases. Mapplets is the feature through which we can reuse transformations across pipelines. Transformations and caching are the key features that we have been using frequently. Informatica PowerCenter is one of the best solutions or products in the data integration space. We have extensively used PowerCenter for integration purposes. We usually look at the best bridge solution in our architecture so that it can sustain for maybe a couple of years. Usually, we go with the solution that fits best and has proven and time-tested technology."
"The interface is very clean and clear."
"It works with any multi-databases, so it works with Sybase, SQL Server. Also, the performance is really good and it is easy to use."
"The ability to have a good bifurcation rate and fewer mistakes is valuable."
"What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated."
"StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"The UI is user-friendly, it doesn't require any technical know-how and we can navigate to social media or use it more easily."
"I am not sure if there are various connectors available in the recent version of Pervasive DI to support the wide range of sources available (e.g., big data, cloud, EME)."
"Integrating new tools can be tricky and challenging."
"Support could be better."
"Compared to solutions offering similar functionalities, Informatica PowerCenter is not very flexible regarding customized integrations."
"The solution's commercial cost is very high. Other open-source tools can do the tool's functions for free. The world is moving to the cloud, but the solution hasn't updated its drivers. I presume that its downfall will start soon. The tool is trying to cross-sell or upsell without helping customers derive benefits from the existing products. They have multiple tools and licenses. It is better to bring the smaller tools in one umbrella."
"The reputation of Informatica is that it is expensive."
"Integrated Reporting service should be more smoothly transitioned from view to function to be in sync with the main design."
"PowerCenter has three clients. I wish they would consolidate everything into one GUI, not three. Also, we had a persistent issue with the Informatica Developer tool but it was solved when we migrated to the newest one."
"I found it is kind of weird that not all of the mapping changes are treated as true changes."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing."
"In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."
"Using ETL pipelines is a bit complicated and requires some technical aid."
"Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using."
"Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which was painful. Also, pipeline failures were common, and data drifting wasn't addressed, which made things worse. Licensing was another issue we encountered."
Earn 20 points