Buyer's Guide
Data Integration Tools
November 2022
Get our free report covering Talend, SAP, Talend, and other competitors of Talend Data Management Platform. Updated: November 2022.
655,465 professionals have used our research since 2012.

Read reviews of Talend Data Management Platform alternatives and competitors

Managing Director at a consultancy with 11-50 employees
Reseller
Top 10
Frees staff to focus on data workflow and on what can be done with data, and away from the details of the technology
Pros and Cons
  • "It's a really powerful platform in terms of the combination of technologies they've developed and integrated together, out-of-the-box. The combination of Kafka and Spark is, we believe, quite unique, combined with CDC capabilities. And then, of course, there are the performance aspects. As an overall package, it's a very powerful data integration, migration, and replication tool."
  • "It's got it all, from end-to-end. It's the glue. There are a lot of other products out there, good products, but there's always a little bit of something missing from the other products. Equalum did its research well and understood the requirements of large enterprise and governments in terms of one tool to rule them all, from a data migration integration perspective."
  • "They need to expand their capabilities in some of the targets, as well as source connectors, and native connectors for a number of large data sources and databases. That's a huge challenge for every company in this area, not just Equalum."

What is our primary use case?

Equalum is used to take legacy data, siloed data and information in the enterprise, and integrate it into a consolidated database that is then used, for the most part, for data transaction, whether it's an actual transaction within the enterprise itself, or a BI dashboard. BI dashboards and analytics are a big area, but overall the main use case is for large data integration across disparate data sources.

The way it's deployed is a combination of on-premises and cloud. It's mainly on-prem, but in Japan and Korea, the adoption of cloud is definitely nowhere near the same as in North America or even Europe. So most of it is deployed on-prem, but there is some hybrid connectivity, in terms of targets being in the cloud and legacy sources being on-prem. Next-gen infrastructure is hosted in the cloud, either by a global partner like AWS, or by their own data infrastructure, in more of a hosted scenario. 

The other model that we're looking at with a couple of partners is a managed service model, whereby Equalum is hosted in the partner's cloud and they provide multi-tenancy for their mainly SME customers.

How has it helped my organization?

The advantage of having a no-code UI, with Kafka and Spark fully managed in the platform engine, comes down to human resources. The number of engineers and that it takes to both develop something of this nature yourself, and then maintain it, is significant. It's not easy. Even if you do have a boatload of engineers and a homegrown-type of capability for looking at open source Spark or Kafka, trying to integrate them and trying to integrate multiple other open source technologies in one platform is a major challenge. Purely from the point of view of getting up to speed, out-of-the-box, within 30 minutes you can literally spin up an instance of Equalum. Anybody who tries to deal with Kafka as well as Spark and then tries to use the technologies, is quite blown away by how quick and easy it is to get moving. You can realize ROI much faster with Equalum. 

Equalum allows a lot of staff to focus on what they know and what they can do, versus having to learn something from scratch, and that lowers the overall risk in a project. Moving the  risk profile is one of the key benefits as well. 

Another benefit is that it frees staff to focus on the consultation aspects, the data workflow and mapping out of an organizations' data, and understanding what one can do with that data. It enables them to focus on the real value, instead of getting into the nitty-gritty of the technology. The added value is immediate for companies deploying this tool versus having to develop their own and maintain their own.

It gives you the ability to handle those data transformations and migrations on-the-fly. It would take a huge effort if you had to do that manually or develop your own tool. The overall legacy situation of databases in most organizations does not allow for real-time crunching of information. Without Equalum, it would be impossible to implement those kinds of data-related processes, unless you have a tool that can really be performant at that level and that speed. That's what this solution is going on: data transformation inside of an enterprise and getting it to go from legacy databases to future database capabilities, from batch to real-time.

What is most valuable?

It's a really powerful platform in terms of the combination of technologies they've developed and integrated together, out-of-the-box. The combination of Kafka and Spark is, we believe, quite unique, combined with CDC capabilities. And then, of course, there are the performance aspects. As an overall package, it's a very powerful data integration, migration, and replication tool. We've looked at a number of other products but Equalum, from a technology perspective, comes out head and shoulders above the rest. We tend to focus mainly on trying to move the legacy businesses here in Japan, which are very slow in moving, and in Korea, from batch and micro-batch to real-time. The combination of those technologies that I just mentioned is really powerful.

It also stands out, very much so, in terms of its ease of use. It's super-simple to use. It has its own Excel-type language, so as long as you know how to use Excel, in terms of data transformation, you can use this tool. And we're talking about being able to do massive data integration and transformation. And that's not referring to the drag-and-drop capabilities, which are for people who have zero skills. Even for them it's that easy. But if somebody does want to do some customization, it has a CLI that's based on the solution's own CLI code transformations, which are as easy as an Excel-type of command. And they've got all the documentation for that.

For consultants, it's a dream tool. A large consultancy practice providing services to large enterprises can make a boatload of money from consultants spending hours and hours developing workflows and actually implementing them right away. And they could then copy those workflows across organizations, or inside the same organization. So you create a drag-and-drop scenario once, or a CLI once, and you could use that in multiple situations. It's very much a powerful tool from a number of angles, but ease of use is definitely one of them.

In addition, it's a single platform for core architectural use cases: CDC replication, streaming ETL, and batch ETL. It also has micro-batch. It's got it all, from end-to-end. It's the glue. There are a lot of other products out there, good products, but there's always a little bit of something missing from the other products. Equalum did its research well and understood the requirements of large enterprise and governments in terms of one tool to rule them all, from a data migration integration perspective.

The speed of data delivery is super-fast. In some cases, when you look at the timestamp of data in the target versus the source, it's down to the hundredths of a second and it's exactly the same number. That's at the highest level, but it's super-fast. It's lightning fast in terms of its ability to handle data.

What needs improvement?

There are areas they can do better in, like most software companies that are still relatively young. They need to expand their capabilities in some of the targets, as well as source connectors, and native connectors for a number of large data sources and databases. That's a huge challenge for every company in this area, not just Equalum.

If I had the wherewithal to create a tool that could allow for all that connectivity, it would be massive, out-of-the-box. There are all the updates every month. An open source changes constantly, so compatibility for these sources or targets is not easy. And a lot of targets are proprietary and they actually don't want you to connect with them in real time. They want to keep that connectivity for their own competitive tool.

What happens is that a customer will say, "Okay, I've got Oracle, and I've got MariaDB, and I've got SQL Server over here, and I've got something else over there. And I want to aggregate that, and put it into Google Cloud Platform." Having connectors to all of those is extremely difficult, as is maintaining them.

So there are major challenges to keeping connectivity to those data sources, especially at a CDC level, because you've got to maintain your connectors. And every change that's made with a new version that comes out means they've got to upgrade their version of the connector. It's a real challenge in the industry. But one good thing about Equalum is that they're up for the challenge. If there's a customer opportunity, they will develop and make sure that they update a connector to meet the needs of the customer. They'll also look at custom development of connectors, based on the customer opportunity. It's a work in progress. Everybody in the space is in the same boat. And it's not just ETL tools. It's everybody in the Big Data space. It's a challenge.

The other area for improvement, for Equalum, is their documentation of the product. But that comes with being a certain size and having a marketing team of 30 or 40 people and growing as an organization. They're getting there and I believe they know what the deficiencies are. Maintaining and driving a channel business, like Equalum is doing, is really quite a different business model than the direct-sales model. It requires a tremendous amount of documentation, marketing information, and educational information. It's not easy.

For how long have I used the solution?

We've been involved with Equalum for about 18 months.

We're partners with Equalum. We resell it into Japan and Korea. The model in North Asia is a reseller/channel model. The majority of our activity is signing resellers and managing the channels for Equalum into markets. We're like an extension of their business, providing them with market entry into North Asia.

We've been reselling every version from that day, 18 months ago, up until now. We've been up to version 2.23, which is the most recent. We're reselling the latest version and installing it on channel partners' technology infrastructure. We tend to use the most recent version, as long as there are no major bugs. So far, that's been the case. We've been installing the most recent product at that moment, without any issues, and the reseller is then using that for internal PoCs or other ones.

What do I think about the stability of the solution?

The stability of Equalum is very good. We deal with double-byte character sets in Japan, so there are little things here and there to deal with when new versions come, but there are basically no issues at all. It's very solid. It's running in multi-billion dollar organizations around the world. I haven't heard any complaints at all. The stability factor seems to be very good.

What do I think about the scalability of the solution?

The scalability is the beauty of this product. It scales both vertically and horizontally. It provides ease of scalability. You've got the ability to add a CPU horizontally, in terms of its hardware servers, and you've got the ability to add additional nodes in a cluster, as well. It's really good for continuing to build larger and larger clusters to handle larger and larger datasets. It does require manual intervention to scale horizontally. Vertically it is literally just a matter of adding hardware to the rack, but horizontal scaling does take some professional services. It's not like pressing a button and, presto. It's not a cloud-based, AWS-type of environment at this time. But that's fine because sometimes, with this kind of data and these types of customer environments, you definitely want to be able to understand what you're dealing with. You've got hundreds, if not thousands, of workflows. It's not something as simple as just clicking a button.

The scalability is a really interesting component and is a reason we are very interested in working with Equalum. It's easy to expand. Obviously, we're in the business to generate revenue. So if we can get adoption of the tool inside an organization, and they continue to add more and more data to the overall infrastructure's architecture, all we need to do is expand the scalability of the solution and we generate additional revenue.

How are customer service and technical support?

Their technical support is fantastic. They have support in Houston and Tel Aviv. We mainly deal with Tel Aviv because it's a much better time zone for us here in Japan and Korea. We use Slack with them. And I was just talking with them this morning about a new support portal they've just released, that we're going to have full access to. 

We carry Equalum business cards and Equalum email addresses. We really are like an extension of the business in Asia, even though we're a separate entity. So when we communicate with the channel partners, we use Equalum email addresses and in doing so we also then answer technical support requests. Our own technical team is going to be integrated into the support portal at some point very soon so that our teams in Korea and Japan can handle questions coming from the customers in the language of the country. We'll be a second line of support to our resellers and, if necessary, of course, it's easy to escalate to the third line support in the U.S. or in Tel Aviv, 24/7.

Which solution did I use previously and why did I switch?

We looked at a few, but we haven't actually worked with another ETL vendor. We had good relations with a number, but we never actually took on an ETL tool in the past.

We dealt in other solution offerings and BI, real-time business intelligence and database infrastructure products, and we did have some interaction with them and we weren't really impressed. There wasn't anything that could really keep up with the kind of data speeds we were trying to process. I came across Equalum at a Big Data event back in 2019. I walked up to their booth and started chatting with Moti who is the head of global sales. At the time, we were looking for something along these lines to combine with our current offering.

This is the first time we have taken on a tool of this nature and it has become the core of our sales approach. We lead with Equalum because you first need to get access to the data. When you transform the data, you can land the data in a database, so we sell another database product. And after we've landed the data in the database target, we then connect it with business analytics and AI tools or platforms, which we also sell. It has become the core, the starting point, of all of our sales activities.

How was the initial setup?

The difficulty level of the initial setup depends on the skills of the engineer who's involved. It now takes our guys a maximum of 30 minutes to deploy it, but the first time they did it, it took a little bit of time to go through it and to understand how it's done. Now, it's really easy, but that first time requires a little bit of hand holding with Equalum's engineers. It's relatively easy once you go through it the first time.

We don't do implementations at the customer location. That's what our partners do. But we support that and we help them with that. Getting the software up and running has been pretty easy. The challenges are around connecting to the database of the customers and getting through VPNs and the like. That's the challenge of getting into any enterprise infrastructure.

What was our ROI?

In terms of savings based on insights enabled through Equalum's streaming ETL, we're not a customer, so we haven't seen the savings. And gathering ROI on these kinds of topics is always difficult, even if you are talking to a customer. But it all comes down to the cost of the technology and the cost of human resources to develop it, maintain it, and manage it, and the project it's being used for. But those savings are certainly the reason our resellers and their customers are looking at acquiring the tool.

What's my experience with pricing, setup cost, and licensing?

They have a very simple approach to licensing. They don't get tied up with different types of connectivity to different databases. If you need more connectors or if you need more CPU, you just add on. It's component-based pricing. It's a really easy model for us and, so far, it's worked well. 

The actual pricing is relative. You talk to some people and they say, "Oh, it's too expensive." And then you talk to other people and they say, "Whoa, that's cheap." It's a case-by-case issue.

For the most part, when we go up against local CDC products, we're always priced higher, but when we go up against big, global ETL vendors, we're priced lower. It comes down to what the customer needs. Is what we have overkill for them? Do they really just need something smaller and less expensive? If pricing is a problem, then we're probably in the wrong game or talking to the wrong people.

Overall, we feel that the pricing is reasonable. We just hope that they don't increase the price too much going forward.

Which other solutions did I evaluate?

We had looked at the likes of Talend, and Informatica, of course. There are so many local Japanese and Korean CDC products. We didn't go really heavily in-depth, looking into that sector because as soon as we saw and understood what Equalum had to offer, and we liked the people that we were dealing with and they liked our approach, they were also interested. Frankly, we didn't see anybody that had the same combination of technologies and that they weren't North Asia.

There are a lot of functionalities that are similar to other products, such as drag-and-drop capabilities and workflows, when you get into the nitty-gritty, but it's the overall package in one tool that is very powerful.

And in terms of consolidating and reducing the number of tools you use, if you look at Informatica, you need four products from Informatica to do what the one product from Equalum does. There's serious consolidation. There are definitely a lot of ETL products that don't do CDC, so you have to buy two products. The whole concept of having one tool handle most of the requirements is a strong selling point of Equalum, and it's good for customer adoption. The side note to that is that if customers have already spent millions of dollars with their current tool, it becomes difficult for them to adopt Equalum. 

What other advice do I have?

It's a great tool to use in any data transformation opportunity, especially focusing on real-time. The word "batch" should never be used in an organization, going forward. I know it's useful and it has its use cases and there are situations where it's helpful. Batch is a form of history and it will probably always be there until that legacy finally disappears. But, overall, if anybody wants to look at migrating and transforming their overall data into a real-time enterprise, there's not a better tool in the market today, in terms of its performance, price, usability, and support. Those four things are the reasons we're selling it.

The biggest thing I have learned from working with Equalum is how difficult it is to actually manage your own Spark and Kafka clusters, and to process data at speed. It's difficult to have the infrastructure and all of the other software to combine everything. In some organizations the effort takes hundreds of people, depending on the size of the enterprise, how much data is involved, and the overall system architecture. What opened my eyes was the fact that, with this tool, you have the ability to alleviate all of the potential headaches associated with developing or maintaining your own clusters of these open source products. 

Large, well-known Asian companies literally have over 1,000 engineers dedicated to managing open source clusters. Those companies are wasting so much money, effort, and brain power by having their engineers focused on managing these really basic things, when they could be deploying a third-party tool like Equalum. They could be letting their engineers drive larger revenue opportunities with more value added around things like what to do with the data, and how to manage the data and the data flow. They could create real value from integrating data from disparate data sources, instead of focusing on the minutia, such as maintaining clusters. The mindset of some of these Japanese and Korean companies is back in 1995. That's the reality. It's a challenge because getting them to change their older business approach and ideology is always difficult. But that is what opened my eyes, the fact that this tool can literally alleviate thousands of people doing a job that they don't need to be doing.

As for the data quality that results from using the tool, it's dependent upon the person who's using the tool. If you are able to transform the data and take the information you want out of it, then it can help your data quality. You can clean up the data and land it in whatever type of structure you would like. If you know what you're doing, you can create really high-quality data that is specific to the needs of the organization. If you don't know what you're doing, and you don't know how to use the tool, you can create more problems. But for the most part, it does allow for data cleansing and the ability to create higher-quality data.

When it comes to Oracle Binary Log Parser as opposed to LogMiner, my understanding is that it has higher performance capabilities in terms of transacting data. It allows for the performance of data being migrated and/or replicated, as well as the transaction processing that takes place in the database, to happen in a much more performant way. The ability to handle those types of binary log requirements is really important to get the performance necessary. Equalum has a partnership with Oracle. Oracle seems to be very open and very positive about the partnership. Although we are replacing a lot of the Oracle GoldenGate legacy products, they don't seem to be too worried because we're connecting to their databases still, and we're landing data, in some cases, into their cloud-based data infrastructure as well. There's definitely a lot of power in the relationship between Equalum and Oracle. It's a very strong benefit for both the product and the company.

From a business point of view, as a partner, we are continuing to educate and to bring resellers up to speed about the new kid on the block. It's always difficult being a new product in these markets because these markets are very risk-aversive and they don't really like small companies. They prefer to deal with large companies, even though the large companies' technologies are kind of outdated. It's a challenge for us to try to educate and to make them aware that their risk is going to be low. That's what's important at the end of the day: It's about lowering risk for partners. If adopting a new technology increases their risk, even if the performance of the technology is better, they won't go along with it, which is a very different mindset to North America, in my opinion.

Overall, we sell a combination of multiple products in the real-time data and Big Data AI space, and Equalum is the core of our offering. It literally does provide the plumbing to the house. And once you get the plumbing installed, it's generally going to be there for a long time. As long as it fits performance and as long as it continues to evolve and adapt, most companies are really happy to keep it.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor. The reviewer's company has a business relationship with this vendor other than being a customer: Reseller.
Karthik Rajamani - PeerSpot reviewer
Principal Engineer at Tata Consultancy Services
Real User
Top 10
Integrates with different enterprise systems and enables us to easily build data pipelines without knowing how to code
Pros and Cons
  • "I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
  • "We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."

What is our primary use case?

I worked mostly on data injection use cases when I was using Data Collector. Later on, I got involved with some Spark-based transformations using Transformer.

Currently, we are not using CI/CD. We are not using automated deployments. We are manually deploying in prod, but going forward, we are planning to use CI/CD to have automated deployments.

I worked on on-prem and cloud deployments. The current implementation is on-prem, but in my previous project, we worked on AWS-based implementation. We did a small PoC with GCP as well.

How has it helped my organization?

It is very easy to use when connecting to enterprise data stores such as OLTP databases or messaging systems such as Kafka. I have had integration with OLTP as well as Kafka. Until a few years ago, we didn't have a good way of connecting to the streaming databases or streaming products. This ability is important because most of our use cases in recent times are of streaming nature. We have to deliver certain messages or data as per our SLA, and the combination of Kafka and StreamSets helps us meet those timelines. I'm not sure what I would have used to achieve the same five years ago. The combination of Kafka and StreamSets has opened up a new world of opportunities to explore. I recently used orchestration wherein you can have multiple jobs, and you can orchestrate them. For example, you can specify to let Job A run first, then Job B, and then Job C in an automated fashion. You don't need any manual intervention. In one of my projects, I had a data hub from 10 different databases. It was all automated by using Kafka and StreamSets.

It enables you to build data pipelines without knowing how to code. You can build data pipelines even if you don't know how to code. You can just drag and drop. If you know how to code, you can do some custom coding as well, but you don't need to know coding to work with StreamSets, which is important if somebody in your team is not familiar with coding. The nature of coding is changing, and the number of technologies is changing. The range is so wide right now. Even if I know Java or Oracle, it may not be enough in today's times because we might have databases in Teradata. We might have Snowflake or other different kinds of databases. StreamSets is a great solution because you don't need to know all different databases or all different coding mechanisms to work with StreamSets. Rather than learning each and every technology and building your data pipelines, you can just plug and play at a faster pace.

StreamSets’ built-in data drift resilience plays a part in our ETL operations. It is a very helpful feature. Previously, we had a lot of jobs coming from different source systems, and whenever there was any change in columns, it was not informed. It required a lot of changes on our end, which would take from a couple of weeks to a month. Because of the data drift feature, which is embedded in StreamSets, we don't have to spend that much time taking care of the columns and making sure they are in sync. All this is taken care of. We don't have to worry about it. It is a very helpful feature to have.

StreamSets' data drift resilience reduces the time to fix data drift breakages. It has definitely saved around two to three weeks of development time. Previously, any kind of changes in our jobs used to require changing our code or table structure and doing some testing. It required at least two to three weeks of effort, which is now taken care of because of StreamSets.

StreamSets’ reusable assets helped to reduce workload. We can use pipeline fragments across multiple projects, which saves development time. The time saved varies from team to team.

It saves us money by not having to hire people with specialized skills. Without StreamSets, for example, I would've had to hire someone to work on Teradata or Db2. We definitely save some money on creating a new position or hiring a new developer. StreamSets provides a lot of features from AWS, Azure, or Snowflake. So, we don't have to find specialized, skilled resources for each of these technologies to create data pipelines. We just need to have StreamSets and one or two DBAs from each team to get the right configuration items, and we can just use it. We don't have to find a specialized resource for each database or technology.

It has helped us to scale our data operations. It saves the licensing costs on some legacy software, and we can reuse pipelines. Once we have a template for a certain use case, we can reuse the same template across different projects to move data to the cloud, which saves us money.

What is most valuable?

I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks. I really like its user-friendliness. It is easy to use. They have a single snapshot across different products, which is very helpful to learn and use the product based on your use case.

Its interface is very cool. If I'm using a batch project or an ETL, I just have to configure appropriate stages. It is the same process if you go with streaming. The only difference is that the stages will change. For example, in a batch, you might connect to Oracle Database, or in streaming, you may connect to Kafka or something like that. The process is the same, and the look-and-feel is the same. The interface is the same across different use cases.

It is a great product if you are looking to ramp up your teams and you are working with different databases or different transformations. Even if you don't have any skilled developers in Spark, Python, Java, or any kind of database, you can still use this product to ramp up your team and scale up your data migration to cloud or data analytics. It is a fantastic product.

What needs improvement?

There are a few things that can be better. We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back.

There are certain features that are only available at certain stages. For example, HTTP Client has some great features when it is used as a processor, but those features are not available in HTTP Client as a destination.

There could be some improvements on the group side. Currently, if I want to know which users are a part of certain groups, it is not straightforward to see. You have to go to each and every user and check the groups he or she is a part of. They could improve it in that direction. Currently, we have to put in a manual effort. In case something goes wrong, we have to go to each and every user account to check whether he or she is a part of a certain group or not.

For how long have I used the solution?

I got exposed to StreamSets in late 2018. Initially, I worked on StreamSets Data Collector, and then, for a year or so, I got exposed to Transformer as well.

What do I think about the stability of the solution?

It is stable, and they're growing rapidly.

What do I think about the scalability of the solution?

It is pretty scalable, but it also depends on where it is installed, which is something a lot of developers misunderstand. Most of the time, the implementation is done on on-prem servers, which is not very scalable. If you install it on cloud-based servers, it is fast. So, the problem is not with StreamSets; the problem is with the underlying hardware. I have worked on both sides. Therefore, I'm aware of the scenarios, but if I were to work purely in the development team, I might not be aware that it is underlying hardware that is causing problems.

In terms of its usage, it is available enterprise-wide. I don't know the exact number of users now because I am not a part of the platform or admin team, but at one time, we had more than 200 users working on this platform. We had one implementation on AWS Cloud and one on GCP. We had Dev, QA, and prod environments. Even now, we have about four environments. We have SIT and NFT, and in prod, we have two environments.

We plan to increase its usage. We are rapidly increasing its usage in our projects. There is a lot of excitement around it. A lot of people want to explore this tool in our organization. A lot of people are trying to learn this technology or use it to migrate their data from legacy databases to the cloud. This will actually encourage more folks to join the data engineering or analytics team. There is a lot of curiosity around the product.

How are customer service and support?

Currently, I'm not involved with them on a daily basis. I'm no longer a part of the platform team, but when I was involved with them two years back, their support was good. Most of the interactions I have had with them were pretty good. They were responsive, and they responded within a day or two. I would rate them a nine out of ten. They were good most of the time, but it could be a challenge to get the right person. They are still a growing company. You need to be a little patient with them to get to the right person to help you with the issues you have.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

About three or four years ago, I worked on Trifacta, which is now acquired by Alteryx. The features were different, and the requirements were different.

Talend is a good product. It seems quite close to StreamSets, but I have not worked on Talend. I just got a demo of Talend a couple of years ago, but I never worked on it. I felt that StreamSets had more features. Its UI was good, and functionality-wise, I found it a little bit more comfortable to use.

How was the initial setup?

I was involved with AWS deployment. At that time, I was a part of the platform team. Now, I work with the application development team, and I'm not involved in that. It was complex at that time. About four years ago, when StreamSets was new, we had a tough time deploying because the documentation was not very clear at that time. A lot of the documents were very good and available on the web, but the documentation wasn't exhaustive or elaborate. We also had our own learning curve. We had someone from StreamSets to help us with the deployment. So, it went well. Now, it is better, but when we did it, it was very complex.

We implemented it in phases. We just implemented or installed the StreamSets platform in our company, and we let a couple of teams use it. We started with Data Collector, and we allowed teams to use and feel it. When they said that this is a good tool to use, we got the enterprise license, and we installed Control Hub and Data Collector. It was not implemented enterprise-wide at the same time. It was released to teams in phases.

What about the implementation team?

It was a mix of a consultant and reseller. It probably was Access Group that helped us with this implementation. At that time, I was in the US, and they were good. Our experience with them was fantastic. We had a couple of consultants from their team to help us with the installation. Now, we have a different vendor in the UK. We have a different partner to help us with that.

We started with about three people, and now, we have more than 20 people on the team. It requires regular maintenance in terms of user management. It is not because of StreamSets; it is because of the underlying software. Data Collector can support a certain number of jobs in parallel. In case we have more tenants on board, we have to increase the Data Collector or Transformer instances to support the increased number of users. 

What was our ROI?

We have definitely seen an ROI. It has helped us in moving into the data analytics world at a faster pace than any other tool would've done. The traditional tools we had didn't provide the functionality that StreamSets offers.

The time for realizing its benefits from deployment depends on the use case or the end requirement. For example, we deployed one project last year, and within a couple of months, we could see a lot of benefits for that team. For some use cases, it could be two months to six months or one year. You can build data pipelines, and you can move data to Snowflake or any cloud database using StreamSets in a matter of a few weeks.

What's my experience with pricing, setup cost, and licensing?

There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it.

The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets.

What other advice do I have?

It is very user-friendly, and I promote it big time in my organization among my peers, my juniors, and across different departments. 

They're growing rapidly. I can see them having a lot of growth based on the features they are bringing. They could capture a lot more market in coming times. They're providing a lot of new features.

I love the way they are constantly upgrading and improving the product. They're working on the product, and they're upgrading it to close the gaps. They have developed a data portal recently, and they have made it free. Anyone who doesn't know StreamSets can just create an account and start using that portal. It is a great initiative. I learned directly on the corporate portal license, but if I were to train somebody in my team who doesn't yet have a license, I would just recommend them to go to the free portal, register, and learn how to use StreamSets. It is available for anyone who wants to learn how to work on the tool.

We use StreamSets' ability to move data into modern analytics platforms. We use it for Tableau, and we use it for ThoughtSpot. It is quite easy to move data into these analytics platforms. It is not very complicated. The problems that we had were mostly outside of StreamSets. For example, most of our databases were on-prem, and StreamSets was installed on the cloud, such as AWS Cloud. There were some issues with that. It wasn't a drawback because of StreamSets. It was pretty straightforward to plug and play.

I have used StreamSets Transformer, but I haven't yet used it with Snowflake. We are planning to use it. We have a couple of use cases we are trying to migrate to Snowflake. I've seen a couple of demos, and I found it to be very easy to use. I didn't see any complications there. It is a great product with the integration of StreamSets Transformer and Snowflake. When we move data from legacy databases to Snowflake, I anticipate there could be a lot of data drift. There could be some column mismatches or table mismatches, but what I saw in the demo was really fantastic because it was creating tables during runtime. It was creating or taking care of the missing columns at runtime. It is a great feature to have, and it will definitely be helpful because we will be migrating our databases to Snowflake on the cloud. It will definitely help us meet our customer goals at a faster pace. 

I would rate it a nine out of ten. They're improving it a lot, and they need to improve a lot, but it is a great product to use.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Flag as inappropriate
Developer, Architect at L'Oreal
Real User
Top 20
Stable, provides good support, and integrating it with other systems is very fast, but its pricing is expensive
Pros and Cons
  • "What I like most about Informatica PowerCenter is that it's the best tool in the market for data integration. Currently, I work in L'Oréal, where a new system from SAP is used. Informatica PowerCenter integration with SAP is very, very fast and very, very simple, so you have the server flow from SAP, and through Informatica PowerCenter, you can ingest the data and make that data available for the business more quickly."
  • "What needs improvement in Informatica PowerCenter is the cloud experience because, nowadays, other companies, such as AWS, Azure, and Google, have more experience in the cloud. The pricing for Informatica PowerCenter on the cloud is also very expensive for customers, so some customers prefer open-source tools or lower-priced tools, such as Azure. From my point of view, Informatica must work on the pricing policy and review the policy on the cloud for Informatica PowerCenter or propose more tools with lower pricing. Clients want the automatic integration of Informatica PowerCenter with other tools. Currently, the integration process is manual, and you have to add other tools to facilitate the integration, especially with the DevOps methodology. You need scripts and tools for the integration, and you'll need to use other integration tools if you want automatic deployment for Informatica PowerCenter, so this is another area for improvement in the solution. What I'd like to see in the next release of the solution is for the integration with APIs to be simpler, because currently, the API integration feature of Informatica PowerCenter is very difficult. It's not intuitive. You have to facilitate API integration and the real-time streaming of messages in Kafka, for example, so that should be improved."

What is our primary use case?

We have several uses cases for Informatica PowerCenter. We ingest data from different sources, for example, from API or other databases. We also apply data transformations in almost all cases via Informatica PowerCenter, to aggregate the data. We also check the integrity of the data and improve the data quality by applying some rules. We specify and identify all the rules and the methodology to apply the rules. We also participate in rule setting, and also check the test cycles for businesses.

What is most valuable?

What I like most about Informatica PowerCenter is that it's the best tool in the market for data integration. Currently, I work in L'Oréal, where a new system from SAP is used. Informatica PowerCenter integration with SAP is very, very fast and very, very simple, so you have the server flow from SAP, and through Informatica PowerCenter, you can ingest the data and make that data available for the business more quickly.

What needs improvement?

What needs improvement in Informatica PowerCenter is the cloud experience because, nowadays, other companies, such as AWS, Azure, and Google, have more experience in the cloud. The pricing for Informatica PowerCenter on the cloud is also very expensive for customers, so some customers prefer open-source tools or lower-priced tools, such as Azure. From my point of view, Informatica must work on the pricing policy and review the policy on the cloud for Informatica PowerCenter or propose more tools with lower pricing.

Clients want the automatic integration of Informatica PowerCenter with other tools. Currently, the integration process is manual, and you have to add other tools to facilitate the integration, especially with the DevOps methodology. You need scripts and tools for the integration, and you'll need to use other integration tools if you want automatic deployment for Informatica PowerCenter, so this is another area for improvement in the solution.

What I'd like to see in the next release of the solution is for the integration with APIs to be simpler, because currently, the API integration feature of Informatica PowerCenter is very difficult. It's not intuitive. You have to facilitate API integration and the real-time streaming of messages in Kafka, for example, so that should be improved.

For how long have I used the solution?

I've been using Informatica PowerCenter for ten years now.

What do I think about the stability of the solution?

Informatica PowerCenter is a stable product. For example, last year my team applied some hotfix to resolve some problems, but it's very rare to experience problems from the solution.

What do I think about the scalability of the solution?

As we work on on-premise environments and the environments are fixed, we're unable to check how scalable Informatica PowerCenter is.

How are customer service and support?

Informatica PowerCenter provides good client support. My team got in touch with support when there was an issue with the solution, and the Informatica PowerCenter support team communicated how to install the hotfix, and the issue was solved.

On a scale of one to five, I'm rating support a four. I didn't give it a five because sometimes, support isn't that qualified in terms of explaining the product and the problem. Informatica PowerCenter support is available, though. For example, I've worked with the support team last year, and I received a call two hours after opening the ticket. The support team answers quickly, but what can take time is understanding and explaining the problem and the solution. Sometimes, it could take three or four days before the problem is resolved because after working on the ticket, it would be escalated to another team, and so that takes time.

How was the initial setup?

The initial setup for Informatica PowerCenter is complex, especially if you don't have a lot of experience with the solution. The deployment could take one day to complete, but in more sophisticated environments, it could take a week, so it depends.

What about the implementation team?

We implemented Informatica PowerCenter through an in-house team.

What's my experience with pricing, setup cost, and licensing?

The pricing for Informatica PowerCenter is expensive. Some clients have simple needs and only want to integrate and store data. Some clients have small needs, while some clients have bigger needs and focus more on performance and time. The licensing cost for Informatica PowerCenter isn't good.

In my project, if I remember correctly, the company pays €20,000 per year for the Informatica PowerCenter license. I'm not aware of additional costs that you need to pay for some features of the solution.

On a scale of one to five, I'm rating its price a three.

Which other solutions did I evaluate?

I've evaluated Azure, Talend, and Data Exchange, but I found Informatica PowerCenter to be better than others. It's just that Informatica PowerCenter needs to have more progress on the cloud side.

What other advice do I have?

I'm an expert on Informatica and some cloud integration tools, such as AWS and Azure.

I have experience with Informatica PowerCenter, Informatica PowerExchange, and also some data quality and master data management tools.

I'm currently using version 10.4 of Informatica PowerCenter.

The solution is deployed on-premises, but I also have experience with its cloud version.

Informatica PowerCenter requires maintenance because you have to take care of the environment, clean the logs, and check some CPU metrics, for example, the memory. The solution also requires some maintenance, for example, you need to restart it when the service is down, so maintaining it uses up your time.

You can have a team dedicated to Informatica PowerCenter maintenance, but that depends on the client. Some clients have dedicated teams for it, while some ask that you take care of the maintenance of the solution.

My team has ten people working on Informatica PowerCenter, but I do know that other teams work on the solution as well, particularly in integration, but I don't have information on how many Informatica PowerCenter users those teams have.

My advice to anyone looking into implementing the solution is to work on it or develop step by step. You also need a good concept to develop complex flows and mapping. You need to develop and test simultaneously to safely deliver good mapping and good products for your clients. You have to test. You have to invest in testing.

I'd rate Informatica PowerCenter seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Flag as inappropriate
Jacopo Zaccariotto - PeerSpot reviewer
Head of Data Engineering at InfoCert
Real User
Top 20
The drag-and-drop interface makes it easier to use than some competing products
Pros and Cons
  • "We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice."
  • "The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode."

What is our primary use case?

We use Pentaho for small ETL integration jobs and cross-storage analytics. It's nothing too major. We have it deployed on-premise, and we are still on the free version of the product.

In our case, processing takes place on the virtual machine where we installed Pentaho. We can ingest data from different on-premises and cloud locations. We still don't carry out the data processing phase inside a different environment from where the VM is running.

How has it helped my organization?

At the start of my team's journey at the company, it was difficult to do cross-platform storage analytics. That means ingesting data from different analytics sources inside a single storage machine and building out KPIs and some other analytics. 

Pentaho was a good start because we can create different connections and import data. We can then do some global queries on that data from various sources. We've been able to replace some of our other data tools like Talend for our managing data warehouse workflow. Later, we adopted some other cloud technologies, so we don't primarily use Pentaho for those use cases anymore. 

What is most valuable?

Pentaho is flexible with a drag-and-drop interface that makes it easier to use than some other ETL products. For example, the full stack we are using in AWS does not have drag-and-drop functionality. Pentaho was a good option at the start of this journey.

We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice.

What needs improvement?

It's difficult to use custom code. Implementing a pipeline with pre-built blocks is straightforward, but it's harder to insert custom code inside the pre-built blocks. The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode.

Repository management is also a shortcoming, but I'm not sure if that's just a limitation of the free version. I'm not sure if Pentaho can use an external repository. It's a flat-file repository inside a virtual machine. Back in the day, we would want to deploy this repository on a database.

Pentaho's data management covers ingestion and insights but I'm not sure if it's end-to-end management—at least not in the free version we are using—because some of the intermediate steps are missing, like data cataloging and data governance features. This is the weak spot of our Pentaho version.

For how long have I used the solution?

We implemented Hitachi Pentaho some time ago. We have been using it for around five or six years. I was using the product at the time, but now I am the head of the data engineering team, so I don't use it anymore but I know Pentaho's strengths and weaknesses.

What do I think about the stability of the solution?

Pentaho is relatively stable, but I average about one failed job every month. 

What do I think about the scalability of the solution?

I rate Pentaho six out of 10 for scalability. The scalability depends on how you deploy it. In our case, the on-premise virtual machine is relatively small and doesn't have a lot of resources. That is why Pentaho does not handle big datasets well in our case. 

I'm also unsure if we can deploy Pentaho in the cloud. So when you're not dealing with the cloud, scalability is always limited. We cannot indefinitely pump resources into a virtual machine.

Currently, we have five or six active workflows running each night. Some of them are ingesting data from ADU. Others take data from AWS Redshift or on-premise Oracle. In terms of people, three other people on the data engineering team and I are actively using Pentaho.

Which solution did I use previously and why did I switch?

We used Talend, which is a Java-based solution and is made for people with proficiency in Java. The entire analytics ecosystem is transitioning to more flexible runtimes, including Python and other languages. Java was not ideal for our data analytics journey.

Right now, we are using NiFi, a tool in the cloud ecosystem that has a similar drag-and-drop interface, but it's embedded in the ADU framework. We're also using another drag-and-drop tool on AWS, but not AWS Glue Studio. 

What was our ROI?

We've seen a 50 percent reduction in our ETL development time using the free version of Pentaho. That saves about 1,000 euros per week, so at least 50,000 euros annually. 

What other advice do I have?

I rate Pentaho eight out of 10. It's a perfect pick for data teams that are getting started and more business-oriented data teams. It's good for a data analyst who isn't so tech-savvy. It is flexible and easy to use. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Tirthankar Roy Chowdhury - PeerSpot reviewer
Teamlead at Tata consultancy services
Real User
Top 5Leaderboard
User-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features
Pros and Cons
  • "The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities."
  • "What needs improvement in IBM InfoSphere DataStage is its pricing. The pricing for the solution is higher than its competitors, so a lot of the clients my company has worked with prefer other tools over IBM InfoSphere DataStage because of the high price tag. Another area for improvement in the solution stems from a lot of new types of databases, for example, databases in the cloud and big data have become available, and IBM InfoSphere DataStage is working on various connectors for different data sources, but that still isn't up-to-date, meaning that some connectors are missing for modern data sources. The latest version of IBM InfoSphere DataStage also has a complex architecture, so my team faced frequent outages and that should be improved as well."

What is our primary use case?

IBM InfoSphere DataStage was mostly used for ETL and data integration purposes, so extract, transfer, and load, including some data quality use cases. My team used the solution to extract data from various sources, do some business transformations, load the data into a target database, or generate files.

What is most valuable?

The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities.

What needs improvement?

What needs improvement in IBM InfoSphere DataStage is its pricing. The pricing for the solution is higher than its competitors, so a lot of the clients my company has worked with prefer other tools over IBM InfoSphere DataStage because of the high price tag.

Another area for improvement in the solution stems from a lot of new types of databases, for example, databases in the cloud and big data have become available, and IBM InfoSphere DataStage is working on various connectors for different data sources, but that still isn't up-to-date, meaning that some connectors are missing for modern data sources.

The latest version of IBM InfoSphere DataStage also has a complex architecture, so my team faced frequent outages and that should be improved as well.

For how long have I used the solution?

I've been working with IBM InfoSphere DataStage for more than seven years.

What do I think about the stability of the solution?

IBM InfoSphere DataStage is a stable product and it's been in the market for quite some time, but in its latest version, there's been some instability caused by the new features introduced in the solution. The architecture was changed a lot and that was causing issues and frequent outages that my company had to go back to IBM for troubleshooting. My team didn't face issues in the earlier version of IBM InfoSphere DataStage. It was the latest version that had instability issues.

What do I think about the scalability of the solution?

IBM InfoSphere DataStage is a very scalable product.

How are customer service and support?

IBM InfoSphere DataStage has a pretty good technical support, but with the new version, particularly the new architecture and the microservice concept, support sometimes takes a bit of time, even for the IBM team to figure out what's wrong, but once that's been figured out, the team comes up with the solution or with a patch.

How was the initial setup?

Setting up IBM InfoSphere DataStage was easy.

How long the deployment takes would depend on certain factors, but it usually takes just two to three hours.

What's my experience with pricing, setup cost, and licensing?

I have no information on the exact pricing for IBM InfoSphere DataStage because the solution is usually procured by the clients my company works with, though the pricing is higher compared to other solutions, so many clients choose to go with a different solution rather than IBM InfoSphere DataStage.

What other advice do I have?

The last version of IBM InfoSphere DataStage which I've worked with was version 11.7.

I work for an IT service company that works with multiple clients on multiple projects, so close to two hundred people use IBM InfoSphere DataStage for various clients.

Per project, on average, three people take care of IBM InfoSphere DataStage deployment, maintenance, and support-related activities.

My advice to people looking into implementing IBM InfoSphere DataStage is that it's a very good product. A lot of similar products have come up nowadays, but this product has a pretty good reputation as it's been in the market for quite a while. I do think other products such as Talend, Informatica PowerCenter, and Informatica Data Quality are better than IBM InfoSphere DataStage.

My rating for IBM InfoSphere DataStage is eight out of ten.

My company has a partnership with IBM.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Flag as inappropriate
Buyer's Guide
Data Integration Tools
November 2022
Get our free report covering Talend, SAP, Talend, and other competitors of Talend Data Management Platform. Updated: November 2022.
655,465 professionals have used our research since 2012.