IT Central Station is now PeerSpot: Here's why
Buyer's Guide
Data Integration Tools
June 2022
Get our free report covering Informatica, Talend, Oracle, and other competitors of SSIS. Updated: June 2022.
610,190 professionals have used our research since 2012.

Read reviews of SSIS alternatives and competitors

PhilipRobinson - PeerSpot reviewer
Senior Engineer at a comms service provider with 501-1,000 employees
Real User
Top 20
Saves time and makes it easy for our mixed-skilled team to support the product, but more guidance and better error messages are required in the UI
Pros and Cons
  • "The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it."
  • "Although it is a low-code solution with a graphical interface, often the error messages that you get are of the type that a developer would be happy with. You get a big stack of red text and Java errors displayed on the screen, and less technical people can get intimidated by that. It can be a bit intimidating to get a wall of red error messages displayed. Other graphical tools that are focused at the power user level provide a much more user-friendly experience in dealing with your exceptions and guiding the user into where they've made the mistake."

What is our primary use case?

We're using it for data warehousing. Typically, we collect data from numerous source systems, structure it, and then make it available to drive business intelligence, dashboard reporting, and things like that. That's the main use of it. 

We also do a little bit of moving of data from one system to another, but the data doesn't go into the warehouse. For instance, we sync the data from one of our line of business systems into our support help desk system so that it has extra information there. So, we do a few point-to-point transfers, but mainly, it is for centralizing data for data warehousing.

We use it just as a data integration tool, and we haven't found any problems. When we have big data processing, we use Amazon Redshift. We use Pentaho to load the data into Redshift and then use that for big data processing. We use Tableau for our reporting platform. We've got quite a number of users who are experienced in it, so it is our chosen reporting platform. So, we use Pentaho for the data collection and data modeling aspect of things, such as developing facts and dimensions, but we then publicly export that data to Redshift as a database platform, and then we use Tableau as our reporting platform.

I am using version 8.3, which was the latest long-term support version when I looked at it the last time. Because this is something we use in production, and it is quite core to our operations, we've been advised that we just stick with the long-term support versions of the product.

It is in the cloud on AWS. It is running on an EC2 instance in AWS Cloud.

How has it helped my organization?

It enables us to create low-code pipelines without custom coding efforts. A lot of transformations are quite straightforward because there are a lot of built-in connectors, which is really good. It has got connectors to Salesforce, which makes it very easy for us to wire up a connection to Salesforce and scrape all of that data into another table. Their flows have got absolutely no code in them. It has a Python integrator, and if you want to go into a coding environment, you've got your choice of writing in Java or Python.

The creation of low-code pipelines is quite important. We have around 200 external data sets that we query and pull the data from on a daily basis. The low-code environment makes it easier for our support function to maintain it because they can open up a transformation and very easily see what that transformation is doing, rather than having to troll through reams and reams of code. ETLs written purely in code become very difficult to trace very quickly. You spend a lot of time trying to unpick it. They never get commented on as well as you'd expect, whereas, with a low-code environment, you have your transformation there, and it almost self documents itself. So, it is much easier for somebody who didn't write the original transformation to pick that up later on.

We reuse various components. For instance, we might develop a transformation that does a lookup based on the domain name to match to a consumer record, and then we can repeat that bit of code in multiple transformations. 

We have a metadata-driven framework. Most of what we do is metadata-driven, which is quite important because that allows us to describe all of our data flows. For example, Table one moves to Table two, Table two moves to table three, etc. Because we've got metadata that explains all of those steps, it helps people investigate where the data comes from and allows us to publish reports that show, "You've got this end metric here, and this is where the data that drives that metric came from." The variable substitution that Pentaho has to allow metadata-driven frameworks is definitely a key feature that Pentaho offers.

The ability to automate data pipeline templates affects our productivity and costs. We run a lot of processes, and if it wasn't reliable, it would take a lot more effort. We would need a lot bigger team to support the 200 integrations that we run every day. Because it is a low-code environment, we don't have to have support instances escalated to the third line support to be investigated, which affects the cost. Very often our support analysts or more junior members are able to look into what an issue is and fix it themselves without having to escalate it to a more senior developer.

The automation of data pipeline templates affects our ability to scale the onboarding of data because after we've done a few different approaches and we get new requirements, they fit into a standard approach. It gives us the ability to scale with code and reuse, which also ties in with the metadata aspect of things. A lot of our intermediate stages of processing data are purely configured in metadata, so in order to implement transformation, no custom coding is required. It is really just writing a few lines of metadata to drive the process, and that gives us quite a big efficiency.

It has certainly reduced our ETL development time. I've worked at other places that had a similar-sized team to manage a system with a much lesser number of integrations. We've certainly managed to scale Pentaho not just for the number of things we do but also for the type of things we do.

We do the obvious direct database connections, but there is a whole raft of different types of integrations that we've developed over time. We have REST APIs, and we download data from Excel files that are hosted in SharePoint. We collect data from S3 buckets in Amazon, and we collect data from Google Analytics and other Google services. We've not come across anything that we've not been able to do with Pentaho. It has proved to be a very flexible way of getting data from anywhere.

Our time savings are probably quite significant. By using some of the components that we've already got written, our developers are able to, for instance, put in a transformation from a staging area to its model data area. They are probably able to put something in place in an hour or a couple of hours. If they were starting from a blank piece of paper, that would be several days worth of work.

What is most valuable?

The graphical nature of the development interface is most useful because we've got people with quite mixed skills in the team. We've got some very junior, apprentice-level people, and we've got support analysts who don't have an IT background. It allows us to have quite complicated data flows and embed logic in them. Rather than having to troll through lines and lines of code and try and work out what it's doing, you get a visual representation, which makes it quite easy for people with mixed skills to support and maintain the product. That's one side of it. 

The other side is that it is quite a modular program. I've worked with other ETL tools, and it is quite difficult to get component reuse by using them. With tools like SSIS, you can develop your packages for moving data from one place to another, but it is really difficult to reuse a lot of it, so you have to implement the same code again. Pentaho seems quite adaptable to have reusable components or sections of code that you can use in different transformations, and that has helped us quite a lot.

One of the things that Pentaho does is that it has the virtual web services ability to expose a transformation as if it was a database connection; for instance, when you have a REST API that you want to be read by something like Tableau that needs a JDBC connection. Pentaho was really helpful in getting that driver enabled for us to do some proof of concept work on that approach.

What needs improvement?

Although it is a low-code solution with a graphical interface, often the error messages that you get are of the type that a developer would be happy with. You get a big stack of red text and Java errors displayed on the screen, and less technical people can get intimidated by that. It can be a bit intimidating to get a wall of red error messages displayed. Other graphical tools that are focused at the power user level provide a much more user-friendly experience in dealing with your exceptions and guiding the user into where they've made the mistake.

Sometimes, there are so many options in some of the components. Some guidance about when to use certain options embedded into the interface would be good so that people know that if they set something, what would it do, and when should they use an option. It is quite light on that aspect.

For how long have I used the solution?

I have been using this solution since the beginning of 2016. It has been about seven years.

What do I think about the stability of the solution?

We haven't had any problems in particular that I can think of. It is quite a workhorse. It just sits there running reliably. It has got a lot to do every day. We have occasional issues of memory if some transformations haven't been written in the best way possible, and we obviously get our own bugs that we introduce into transformations, but generally, we don't have any problems with the product.

What do I think about the scalability of the solution?

It meets our purposes. It does have horizontal scaling capability, but it is not something that we needed to use. We have lots of small-sized and medium-sized data sets. We don't have to deal with super large data sets. Where we do have some requirements for that, it works quite well. We can push some of that processing down onto our cloud provider. We've dealt with some of such issues by using S3, Athena, and Redshift. You can almost offload some of the big data processing to those platforms.

How are customer service and support?

I've contacted them a few times. In terms of Lumada's ability to quickly and effectively solve issues that we brought up, we get a very good response rate. They provide very prompt responses and are quite engaging. You don't have to wait long, and you can get into a dialogue with the support team with back and forth emails in just an hour or so. You don't have to wait a week for each response cycle, which is something I've seen with some of the other support functions. 

I would rate them an eight out of 10. We've got quite a complicated framework, so it is not possible for us to send the whole thing over for them to look into it, but they certainly give help in terms of tweaks to server settings and some memory configurations to try and get things going. We run a codebase that is quite big and quite complicated, so sometimes, it might be difficult to do something that you can send over to show what the errors are. They wouldn't log in and look at your actual environment. It has to be based on the log files. So, it is a bit abstract. If you have something that's occurring just on a very specific transformation that you've got, it might be difficult for them to drill into to see why it is causing a problem on our system.

Which solution did I use previously and why did I switch?

I have a little bit of experience with AWS Glue. Its advantage is that it is tied natively into the AWS PySpark processing. Its disadvantage is that it writes some really difficult-to-maintain lines of code for all of its transformations, which might work fine if you have just a dozen or so transformations, but if you have a lot of transformations going on, it can be quite difficult to maintain.

We've also got quite a lot of experience working with SSIS. I much prefer Pentaho to SSIS. The SSIS ties you rigidly to your data flow structure that exists at design time, whereas Pentaho is very flexible. If, for instance, you wanted to move 15 columns to another table, in SSIS, you'd have to configure that with your 15 columns. If a 16th column appears, it would break that flow. With Pentaho, without amending your ETL, you can just amend your end data set to accept the 16th column, and it would just allow it to flow through. This and the fact that the transformation isn't tied down at the design time make it much more flexible than SSIS.

In terms of component reuse, other ETL tools are not nearly as good at being able to just pick up a transformation or a sub-transformation and drop it into your pipelines. You do tend to keep rewriting things again and again to get the same functionality.

What about the implementation team?

I was here during the initial setup, but I wasn't involved in it. We used an external company. They do our upgrades, etc. The reason for that is that we tend to stick with just the long-term support versions of the product. Apart from service packs, we don't do upgrades very often. We never get a deep experience of that, so it is more efficient for us to bring in this external company that we work with to do that.

What was our ROI?

It is always difficult to quantify a return on investment for data warehousing and business intelligence projects. It is a cost center rather than a profit center, but if you take the starting point as this is something that needs to be done, you could pick up the tools to do it. In the long run, you would necessarily find that they are much cheaper. If you went for more of a coded approach, it might be cheaper in terms of licensing, but then you might have higher costs of maintaining that.

What's my experience with pricing, setup cost, and licensing?

It does seem a bit expensive compared to the serverless product offering. Tools, such as Server Integration Services, are "almost" free with a database engine. It is comparable to products like Alteryx, which is also very expensive.

It would be great if we could use our enterprise license and distribute that to analysts and people around the business to use in place of Tableau Prep, etc, but its UI is probably a bit too confusing for that level of user. So, it doesn't allow us to get the tool as widely distributed across the organization to non-technical users as much as we would like.

What other advice do I have?

I would advise taking advantage of using metadata to drive your transformations. You should take advantage of the very nice and easy way in which variable substitution works in a lot of components. If you use a metadata-driven framework in Pentaho, it will allow you to self-document your process flows. At some point, it always becomes a critical aspect of a project. Often, it doesn't crop up until a year or so later, but somebody always comes asking for proof or documentation of exactly what is happening in terms of how something is getting to here and how something is driving a metric. So, if you start off from the beginning by using a metadata framework that self documents that, you'll be 90% of the way in answering those questions when you need to.

We are satisfied with our decision to purchase Hitachi's products, services, or solutions. In the low-code space, they're probably reasonably priced. With the serverless architectures out there, there is some competition, and you can do things differently using serverless architecture, which would have an overall lower cost of running. However, the fact that we have so many transformations that we run, and those transformations can be maintained by a team of people who aren't Python developers or Java developers, and our apprentices can use this tool quite easily, is an advantage of it.

I'm not too familiar with the overall roadmap for Hitachi Vantara. We're just using the Pentaho data integration products. We don't use the metadata injection aspects of Pentaho mainly because we did have a need for them, but we know they're there. 

I would rate it a seven out of 10. Its UI is a bit techy and more confusing than some of the other graphical ETL tools, and that's where improvements could be made.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Flag as inappropriate
Database Administrator at a energy/utilities company with 1,001-5,000 employees
Real User
Top 5Leaderboard
Keeps the source and target synchronized at all times
Pros and Cons
  • "The main impact for Oracle LogMiner is the performance. Performance is drastically reduced if you use the solution’s Oracle Binary Log Parser. So, if we have 60 million records, initially it used to take a minute. Now, it takes a second to do synchronization from the source and target tables."
  • "Right now, they have a good notification system, but it is in bulk. For example, if I have five projects running and I put a notification, the notification comes back to me for all five projects. I would like the notification to come back only for one project."

What is our primary use case?

We use it for replication. We have databases and SQL Server. There is some data that needs to go to Oracle for the application team because the application is connected to Oracle Databases, but the back-end application is connected to SQL Server. Then, we create workflows, where SQL Server is the source, Oracle is the target, and all the tables in SQL Server replicate to Oracle. We have 59 flows for five Databases. These go into production, development, and staging multiplied by three. This is how many flows that we have.

How has it helped my organization?

There are applications which have stopped supporting Oracle. Now, the entire application is being migrated to SQL Servers. The entire application data comes into SQL servers, but because the other applications are still linked to Oracle data, they still need Oracle. So, initially when we didn't have Equalum, we used to write Python scripts to pull data from a SQL Server and put it in Oracle, but the Python script requires a lot of maintenance and development. Also, if there were any problem, you needed to have the development knowledge to go and change the Python script. 

Since getting Equalum, the data has been flowing very fast. I don't have any knowledge of Python scripts, but still I can create flows. The data streams very well. The data is in synchronization as well. The notification system is good. So, if there are any problems in SQL Server or Oracle, Equalum notifies us that there is a problem, then we can go and check the problem on our end. If the problem is on their end, we have the ticketing system, which is very good. You can open the tickets. If it is a critical production issue, then you open a ticket and they respond very quickly. 

Initially, there was a big development team running the Python scripts. Now, we don't need to hire anyone extra. As an SQL Server DBA, I take care of it. We also have a help desk team who takes care of it. With all the people who are using Equalum, it does not need any extra support, hires, or resources.

Overall, Equalam has resulted in a lot of system performance improvements in our organization. It has helped us out by keeping the source and target synchronized at all times.

What is most valuable?

It has good features. It has a replication feature that is wonderful because the data is streaming live and we can change the pulling rates. Initially, this took 50 seconds. However, whatever changes happened from SQL Server to Oracle, they now happen within 30 seconds when it is pulled via Equalum. 

The Equalum tool is a good development tool and user-friendly as well. The front-end is user-friendly because it has a nice, easy methodology. It takes hardly a day to teach someone who can then create the workflow. Once the workflow is set, you don't have to do anything. The data constantly flows from SQL Server to Oracle, i.e., the source to the target. 

It has a strong command line feature. So, if it is front-end, like in SSIS, then I have to create each flow manually. However, in Equalum, we can write a command line program and deploy 50 to 100 flows together at once through the command line. 

Equalum provides a single platform for the following core architectural use cases: CDC replication, streaming ETL, and batch ETL. The CDC is important for me as an SQL Server DBA. So, if there is no CDC, then all my data has to be pulled directly from my tables, which then have to already be linked to the application. So, there will be a performance hit. Now, because there is CDC, the change data captured goes into the CDC table and Equalum pulls from that CDC table. Therefore, there is no user impact on my DB servers. 

They have something called binary logs for Oracle. If you have these logs in place, then you can pull the data through the logs. That is convenient because you can pull the big data in through batch processing, which I have not personally used myself. Though I have seen, in my organization, people using batches because they can schedule them. While my data is live streaming and keeps on streaming every three minutes, some data doesn't require live streaming. So, every day in the morning, after I pull the data from source to target, then they can use batch processing, which is good.

It is important to me that the solution provides a no-code UI, with Kafka and Spark fully-managed in the platform engine, because then I don't have to take care of anything. There are no backup problems. For the flows that I create, I don't have to make a backup, restore or maintain them. I just need to create the workflow from my end. I need a user in my source and a user in target from the database perspective. Then, the front-end is taken care by Equalum to Kafka, which makes it very user-friendly.

When we are taking the data from the source to target, we can add fields, like timestamp. So, data accuracy is very prompt and 100 percent. Whatever data you have in the source, that is the exact data reflected in the target. For the many months that I have been using it for all my projects, I haven't found any data discrepancies, etc. There has not been a time when the source of data is different from the target data, which is very good.

What needs improvement?

Right now, they have a good notification system, but it is in bulk. For example, if I have five projects running and I put a notification, the notification comes back to me for all five projects. I would like the notification to come back only for one project. They are working on this improvement because we told them about it. There are the small changes that we keep on asking from them, and they do them for us. If you want features or to modify it, they help us with that. So, the team is on it at all times.

For how long have I used the solution?

I have been using it for six months. 

The company has been using the solution for six to seven years.

What do I think about the stability of the solution?

It is robust. The stability is good. Long-term, it is a nice, strong tool.

What do I think about the scalability of the solution?

We have multiple nodes. For failover, the data fails over to another node, then it is distributed. Initially, when we started Equalum, it was only one project with 59 flows. Now, we have 400 to 500 flows. It is easily scalable. We didn't have to do much on our side for scalability purposes. 

If my number of loads were 50 initially, but now I am running 500 flows, then we are bringing in more applications to SQL Server as Oracle support is stopped. The more data that comes into SQL Server, the more streaming we have to do to use Equalum. We are talking about huge scalability. For the users, we don't have to do much. Instead of seeing 50 flows on the screen, I see 500 flows on the screen. However, from behind the scenes, I think Equalum has to give us more resources.

How are customer service and technical support?

If there is anything that we want to change, we go to the Equalum team. The support is wonderful. They came back to us, giving us a demo on how to use it. They were very nice in that way. They respond very quickly. Their support is very good.

They keep giving us more training on how to use Equalum. The Equalum team comes in and tells us about new features. We have a meeting where they talk with us every week.

When I used to stream the flow, from the source to target, if something changed or stopped working, then I would bring my entire source to the target as brand new. This is called restreaming. When I used to restream, it would take a lot of time. Now, they have done new upgrades. In those upgrades, the restreaming is very fast. Also, previously they didn't have this restreaming feature on the front-end. Wherever restreaming had to be done, it had to be done from the command line. Now, they have brought the feature of restream to the front-end. These are the two very good features that they have done for us recently.

Which solution did I use previously and why did I switch?

We used Python scripts previously. Heavy development on the Python side was needed. Also, it needs a developer experienced in writing Python scripts. They must have that understanding. Plus, maintenance also needs to be done through a developer. Because Equalum is a UI tool, you can do so many things. It is a good tool to use too. It's like a tool versus a script. Obviously, you will prefer the tool,

I like the overall ease of use of the solution’s user interface very much because I was a heavy user of SSIS before, which was the only ETL tool that I have used before for data warehousing. When I came to this company six months back, I got introduced to Equalum. I find Equalum very good because it has multiple sources and targets. There are quite a bit of very good options, like SQL Server to Oracle, then SQL. As long as the source and the target have Java Database Connectivity (JDBC), they can be replicated. The tool is very simple to use. The command line takes time for you to understand, but once you understand it, then it is easy-going. The front-end is very user-friendly, so there aren't any issues.

How was the initial setup?

The initial setup was straightforward. There is nothing complex. Obviously, there were commands that I didn't know to write first. They helped me to understand the commands. Once you understand the commands, using the command line and front-end, then it is all straightforward. There are no hidden complexities. 

They have good documentation. Yesterday, I was asking the Equalum about something, so they sent me the documentation for that. The documentation is well-detailed. They have videos supporting it. If there is a new feature coming out, or any new training you want to do, then they have videos in place. The videos are very good. So, you can review the code and follow the video, then do your work.

If it is a SQL Server, then as a DBA, I have to enable CDC and make sure there is a user with proper privileges. Then, if I have Oracle, I need a user over there with proper privileges, based on what they have given us in the documentation. Once all this is ready on my end, then it is a straightforward deployment.

Deployment does not take much time. If you do a brand new deployment, it is like half an hour or an hour maximum. When bringing all the tables from source to data for the first time, it takes some time, around five hours maximum, for all the data from the source to the target to stream. Once it is streamed, then it is very quick. If there are very few tables, I have seen deployment finishing in half an hour.

What was our ROI?

The main impact for Oracle LogMiner is the performance. Performance is drastically reduced if you use the solution’s Oracle Binary Log Parser. So, if we have 60 million records, initially it used to take a minute. Now, it takes a second to do synchronization from the source and target tables.

If we were not using Equalum, then we would need to use Python scripts, C#, etc., which need heavy development and more time. Timing is okay, because you only need to write the script one time, then you can use it. However, the maintenance is very difficult. If you don't have someone with the knowledge of Python and C#, then you cannot go and modify the scripts. Whereas, in Equalum, we work with an Equalum support team, and our Flex team also takes care of Equalum. If there is an issue or if they want a flow to be created, they do it themselves. We don't even have to have any scripting or programming knowledge.

Equalum has improved the speed of data delivery more than 50 percent. Python script used to take time to run, then you had to schedule it and take care of the scheduler. Sometimes, for some reason, the scheduler did not work, then your job fails. With this solution, it does not have that issue. This can do live streaming, if you want. Or, if you want batch processing, then you can schedule batches, and it runs.

Which other solutions did I evaluate?

Our team did PoCs and selected Equalum.

What other advice do I have?

We don't use it much for its transformation part. We didn't initially know about the transformation part of it. For example, if I have a new number column in the source and I want to round up the figures or do some string transformation, find, or replace, then I can directly do that from the transformation operators. We obviously used it for replication before. Now, we are using it for transformation as well.

If you want strong replication between any source and target with JDBC, go for Equalum. It's simple, easy to use, and requires less maintenance and tasks to be done. The tool takes care of all your requirements. So, you don't need to do daily backup and restore tasks. It is a straightforward tool. So, if you're using ETL, try Equalum. It is the best bet.

I would rate the solution as 10 out of 10. I have no issues so far.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Technology Director at a computer software company with 1,001-5,000 employees
Real User
Top 20
A stable, scalable, and mature solution for complex transformations and data integration
Pros and Cons
  • "Complex transformations can be easily achieved by using PowerCenter. The processing layer does transformations and other things. About 80% of my transformations can be achieved by using the middle layer. For the remaining 15% to 20% transformations, I can go in and create stored procedures in the respective databases. Mapplets is the feature through which we can reuse transformations across pipelines. Transformations and caching are the key features that we have been using frequently. Informatica PowerCenter is one of the best solutions or products in the data integration space. We have extensively used PowerCenter for integration purposes. We usually look at the best bridge solution in our architecture so that it can sustain for maybe a couple of years. Usually, we go with the solution that fits best and has proven and time-tested technology."
  • "Its licensing can be improved. It should be features-wise and not bundle-wise. A bundle will definitely be costly. In addition, we might use one or two features. That's why the pricing model should be based on the features. The model should be flexible enough based on the features. Their support should also be more responsive to premium customers."

What is our primary use case?

We receive the data files from insurance carriers. We load the data, do some processing, and then make the data available to their internal systems. We apply some business transformations and data validations and share the data in the healthcare domain and EDI B2B exchange.

Its version varies based on the customer, but when it comes to the cloud, we have the latest version. We have been using Informatica Intelligent Cloud Services (IICS) for two years. We have used IICS for two purposes. One is for data integration, and the other one is for application integration.

What is most valuable?

Complex transformations can be easily achieved by using PowerCenter. The processing layer does transformations and other things. About 80% of my transformations can be achieved by using the middle layer. For the remaining 15% to 20% transformations, I can go in and create stored procedures in the respective databases. Mapplets is the feature through which we can reuse transformations across pipelines. Transformations and caching are the key features that we have been using frequently. 

Informatica PowerCenter is one of the best solutions or products in the data integration space. We have extensively used PowerCenter for integration purposes. We usually look at the best bridge solution in our architecture so that it can sustain for maybe a couple of years. Usually, we go with the solution that fits best and has proven and time-tested technology.

What needs improvement?

Its licensing can be improved. It should be features-wise and not bundle-wise. A bundle will definitely be costly. In addition, we might use one or two features. That's why the pricing model should be based on the features. The model should be flexible enough based on the features.

Their support should also be more responsive to premium customers.

For how long have I used the solution?

We have been dealing with this product for around 20 years.

What do I think about the stability of the solution?

It is stable.

What do I think about the scalability of the solution?

It is easy to scale. It has the concept of a grid, but it is too costly because you also need Informatica big data edition. Informatica PowerCenter has its own software licensing called big data edition. So, whenever we want to scale to a huge volume, we need to go with big data. It has horizontal scaling, which always impacts the price. I can go for a four-node cluster grid or an eight-node cluster grid, which will have an impact on the price.

We have scaled it for one of our enterprise customers, and we had a little bit of complexity in setting it up, but we were able to achieve the desired goals with respect to the performance.

How are customer service and technical support?

We have been in touch with Informatica support. They should be more responsive. They should react quickly to premium customers.

Which solution did I use previously and why did I switch?

Different products are suitable for different use cases. One unified product cannot address multiple use cases. For example, Informatica may not be suitable for real-time integration, but it may be helpful for batch integration. Similarly, MuleSoft caters to low latency, but it doesn't get into high latency, which means it is not suitable for larger data processing systems. That's why we need to have a combination of technologies. For example, I may say that I have a resource or a person who has a primary skill and a secondary skill. Primary skill is specialized, and then secondary skill is complementary to that. Similarly, every product has a primary thing and a secondary thing. For Informatica, it is the batch processing, not the real-time data processing. It only processes the structured data, but it does not process unstructured data. For unstructured data, you need to go for a different product. For the cloud version, they have introduced IICS specifically for catering to the application integration requirements.

How was the initial setup?

It is a simple process because we have a pool of experts in our team. Some manual work could be required. We did have challenges while setting it up, but we could also resolve those challenges. 

What about the implementation team?

We provide development, integration, operations, and maintenance services for Informatica solutions to the customers. We do ETL monitoring, and we monitor the daily loads and resolve and manage the pipeline features. 

We also handle the change requests coming from the customer and performance optimization. We may sometimes also look at the transformations optimization, caching, and all those things.

What's my experience with pricing, setup cost, and licensing?

It is for big enterprises. We have leveraged Informatica for big enterprises but not for small and medium enterprises because it is a very costly product as compared to other products. We propose this solution only for enterprise customers. For small to medium enterprises, we would propose the Microsoft solution.

Its licensing is currently bundle-wise. It should be features-wise and not bundle-wise.

What other advice do I have?

It is a very mature product, but now everyone is moving to the cloud. They need to give more attention and focus more on the IICS rather than PowerCenter. PowerCenter on-premises will cater to one or two industries, and they should give more features in IICS. It can cater to only 70% of the features of Informatica PowerCenter. The data integration is currently not as mature in IICS. It doesn't have Mapplets, and we have faced some constraints. We have already logged a service request for this.

I would rate Informatica PowerCenter a nine out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Implementor
Software Engineer at a tech vendor with 1,001-5,000 employees
Real User
Effective transformation, beneficial data manipulation, but user interfaces could improve
Pros and Cons
  • "The most useful functions of Qlik Replicate are the data manipulation to transformations."
  • "When you remote into it the Qlik Replicate UI a lot of times it just freezes. We set up the EC2, to allow them to go to the server and click on the Replicate icon, it just opens up and just sits there. At that point, we have to go into the EC2 and then reboot the server. This should be fixed, it is frustrating."

What is our primary use case?

We use EC2 for the cloud service for Qlik Replicate. We generally use Qlik Replicate from SQL server source to SQL server source destination, or to the AWS RDS instance. Either a single server on-premise or AWS RDS Postgre Aurora.

How has it helped my organization?

Qlik Replicate has improved our organization because it has allowed each team the ability to do their own data replication into our single-source data, such as data lake or our AWS. All the data can be moved to a single location.

What is most valuable?

The most useful functions of Qlik Replicate are data manipulation and transformations.

What needs improvement?

When you remote into it the Qlik Replicate UI a lot of times it just freezes. We set up the EC2, to allow them to go to the server and click on the Replicate icon, it just opens up and just sits there. At that point, we have to go into the EC2 and then reboot the server. This should be fixed, it is frustrating.

When we do receive errors when the solution suddenly fails it will give a message of  "Error." You have to go and figure out which setting you have to tweak to get the actual error. This could be improved by being more descriptive or more intuitive. They have eight different options of things the error would say, such as enhance logging, but you don't really know what that means. The wording doesn't tell you if it fails. It makes it difficult when trying to find the actual problem.

In a feature release, they could improve the solution by making it easier to use two different destinations. If we want to do have on-premise and in the cloud, it seemed difficult when I was attempting the process. It would be useful to just have one source and then two different destinations, using the same transformations and other configurations.

For how long have I used the solution?

I have been using Qlik Replicate for approximately six months ago.

What do I think about the scalability of the solution?

The scalability is very good because we are using Amazon AWS.

We have approximately 10 people using the solution in my organization and most of them are developers, such as software engineers. The solution is moderately used in our organization.

How are customer service and support?

The technical support could improve. It is difficult to connect with them on the phone. There can be a lot of back and forth communication.

Which solution did I use previously and why did I switch?

We previously used SSIS and an in-house built product.

We ended up choosing Qlik Replicate because we were doing a lot of transforms, and decided to list the data in one place. Qlik Replicate was a good product to get data over without turning on a CDC. We wanted a tool that was easy to use and we decided Qlik Replicate was right for us.

How was the initial setup?

The initial setup involved setting up the source and the privileges needed on every source. DBAs don't like that too much, but they have to give each team access privileges but that's the only way to use Qlik Replicate. Additionally but then the troubleshooting and stuff, it took about a month just getting all the stuff right, all the details. The initial additional troubleshooting and other aspects of the implementation took about a month to have all set up correctly.

What about the implementation team?

We did the implementation in-house. The solution has not needed a lot of maintenance, we do not have an assigned individual for maintenance.

Which other solutions did I evaluate?

We are evaluating other solutions at this time, such as DM, which uses Qlik Replicate in the background.

What other advice do I have?

Overall the solution is straightforward to use. It does a good job at what it needs to, but it's not as robust in the transformations, for example, the lookups.

I rate Qlik Replicate a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
User with 5,001-10,000 employees
Real User
Easy to set up, and reasonably priced, but the user experience could be improved
Pros and Cons
  • "Microsoft supported us when we planned to provision Azure Data Factory over a private link. As a result, we received excellent support from Microsoft."
  • "User-friendliness and user effectiveness are unquestionably important, and it may be a good option here to improve the user experience. However, I believe that more and more sophisticated monitoring would be beneficial."

What is most valuable?

Essentially, Azure Data Factory is more aligned to ETL, but I wanted to provide a solution for a full data lake solution where I could leverage functionality, whether it is ETL, data ingestion, data warehousing, or data lake.

What needs improvement?

I was planning to switch to Synapse and was just looking into Synapse options.

I wanted to plug things in and then put them into Power BI. Basically, I'm planning to shift some data, leveraging the skills I wanted to use Synapse for performance.

I am not a frequent user, and I am not an Azure Data Factory engineer or data engineer. I work as an enterprise architect. Data Factory, in essence, becomes a component of my solution. I see the fitment and plan on using it. It could be Azure Data Factory or Data Lake, but I'm not sure what enhancements it would require.

User-friendliness and user effectiveness are unquestionably important, and it may be a good option here to improve the user experience. However, I believe that more and more sophisticated monitoring would be beneficial.

For how long have I used the solution?

I work as an enterprise architect, and I have been using Azure Data Factory for more than a year.

I am working with the latest version.

What do I think about the stability of the solution?

Azure Data Factory is a stable solution.

What do I think about the scalability of the solution?

Azure Data Factory is a scalable product.

In my current company, I have a team of five people, but in my previous organization, there were 20.

How are customer service and support?

Technical support is good. We encountered no technical difficulties. Microsoft supported us when we planned to provision Azure Data Factory over a private link. As a result, we received excellent support from Microsoft.

Which solution did I use previously and why did I switch?

Products such as Azure Data Factory and Informatica Enterprise Data Catalog were evaluated. This is something I'm working on. I work as an enterprise architect, so these are the tools that I frequently use.

Previously, I worked with SSIS. We did not change. Because we were building a cloud-based ETF solution Azure Data Factory was an option, but when it came to on-premises solutions, the SQL server integrating the SSIS tool was one option.

How was the initial setup?

The initial setup is easy.

It took three to four weeks to get up to speed and get comfortable using it.

What's my experience with pricing, setup cost, and licensing?

Pricing appears to be reasonable in my opinion.

What other advice do I have?

My only advice is that Azure Data Factory, particularly for data ingestion, is a good choice. But if you want to go further and build an entire data lake solution, I believe Synapse, is preferred. In fact, Microsoft is developing and designing it in such a way that, it's an entirely clubbing of data ingestion, and data lake, for all things. They must make a decision: is the solution dedicated to only doing that type of data ingestion, in which case I believe Data Factory is the best option.

I would have preferred, but I'm not a frequent user there right now. I need to think beyond Data Factory as an open-source project to include machines and everything else. As a result, as previously stated, Data Factory becomes very small at the enterprise architect level. I was inundated with power automation, power ops, power virtualizations, and everything else in Microsoft that I had to think about.

I would rate Azure Data Factory a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Buyer's Guide
Data Integration Tools
June 2022
Get our free report covering Informatica, Talend, Oracle, and other competitors of SSIS. Updated: June 2022.
610,190 professionals have used our research since 2012.