IT Central Station is now PeerSpot: Here's why
Buyer's Guide
Integration Platform as a Service (iPaaS)
July 2022
Get our free report covering Microsoft, Amazon, Talend, and other competitors of Talend Cloud Integration. Updated: July 2022.
621,593 professionals have used our research since 2012.

Read reviews of Talend Cloud Integration alternatives and competitors

Managing Director at a consultancy with 11-50 employees
Reseller
Top 10
Frees staff to focus on data workflow and on what can be done with data, and away from the details of the technology
Pros and Cons
  • "It's a really powerful platform in terms of the combination of technologies they've developed and integrated together, out-of-the-box. The combination of Kafka and Spark is, we believe, quite unique, combined with CDC capabilities. And then, of course, there are the performance aspects. As an overall package, it's a very powerful data integration, migration, and replication tool."
  • "It's got it all, from end-to-end. It's the glue. There are a lot of other products out there, good products, but there's always a little bit of something missing from the other products. Equalum did its research well and understood the requirements of large enterprise and governments in terms of one tool to rule them all, from a data migration integration perspective."
  • "They need to expand their capabilities in some of the targets, as well as source connectors, and native connectors for a number of large data sources and databases. That's a huge challenge for every company in this area, not just Equalum."

What is our primary use case?

Equalum is used to take legacy data, siloed data and information in the enterprise, and integrate it into a consolidated database that is then used, for the most part, for data transaction, whether it's an actual transaction within the enterprise itself, or a BI dashboard. BI dashboards and analytics are a big area, but overall the main use case is for large data integration across disparate data sources.

The way it's deployed is a combination of on-premises and cloud. It's mainly on-prem, but in Japan and Korea, the adoption of cloud is definitely nowhere near the same as in North America or even Europe. So most of it is deployed on-prem, but there is some hybrid connectivity, in terms of targets being in the cloud and legacy sources being on-prem. Next-gen infrastructure is hosted in the cloud, either by a global partner like AWS, or by their own data infrastructure, in more of a hosted scenario. 

The other model that we're looking at with a couple of partners is a managed service model, whereby Equalum is hosted in the partner's cloud and they provide multi-tenancy for their mainly SME customers.

How has it helped my organization?

The advantage of having a no-code UI, with Kafka and Spark fully managed in the platform engine, comes down to human resources. The number of engineers and that it takes to both develop something of this nature yourself, and then maintain it, is significant. It's not easy. Even if you do have a boatload of engineers and a homegrown-type of capability for looking at open source Spark or Kafka, trying to integrate them and trying to integrate multiple other open source technologies in one platform is a major challenge. Purely from the point of view of getting up to speed, out-of-the-box, within 30 minutes you can literally spin up an instance of Equalum. Anybody who tries to deal with Kafka as well as Spark and then tries to use the technologies, is quite blown away by how quick and easy it is to get moving. You can realize ROI much faster with Equalum. 

Equalum allows a lot of staff to focus on what they know and what they can do, versus having to learn something from scratch, and that lowers the overall risk in a project. Moving the  risk profile is one of the key benefits as well. 

Another benefit is that it frees staff to focus on the consultation aspects, the data workflow and mapping out of an organizations' data, and understanding what one can do with that data. It enables them to focus on the real value, instead of getting into the nitty-gritty of the technology. The added value is immediate for companies deploying this tool versus having to develop their own and maintain their own.

It gives you the ability to handle those data transformations and migrations on-the-fly. It would take a huge effort if you had to do that manually or develop your own tool. The overall legacy situation of databases in most organizations does not allow for real-time crunching of information. Without Equalum, it would be impossible to implement those kinds of data-related processes, unless you have a tool that can really be performant at that level and that speed. That's what this solution is going on: data transformation inside of an enterprise and getting it to go from legacy databases to future database capabilities, from batch to real-time.

What is most valuable?

It's a really powerful platform in terms of the combination of technologies they've developed and integrated together, out-of-the-box. The combination of Kafka and Spark is, we believe, quite unique, combined with CDC capabilities. And then, of course, there are the performance aspects. As an overall package, it's a very powerful data integration, migration, and replication tool. We've looked at a number of other products but Equalum, from a technology perspective, comes out head and shoulders above the rest. We tend to focus mainly on trying to move the legacy businesses here in Japan, which are very slow in moving, and in Korea, from batch and micro-batch to real-time. The combination of those technologies that I just mentioned is really powerful.

It also stands out, very much so, in terms of its ease of use. It's super-simple to use. It has its own Excel-type language, so as long as you know how to use Excel, in terms of data transformation, you can use this tool. And we're talking about being able to do massive data integration and transformation. And that's not referring to the drag-and-drop capabilities, which are for people who have zero skills. Even for them it's that easy. But if somebody does want to do some customization, it has a CLI that's based on the solution's own CLI code transformations, which are as easy as an Excel-type of command. And they've got all the documentation for that.

For consultants, it's a dream tool. A large consultancy practice providing services to large enterprises can make a boatload of money from consultants spending hours and hours developing workflows and actually implementing them right away. And they could then copy those workflows across organizations, or inside the same organization. So you create a drag-and-drop scenario once, or a CLI once, and you could use that in multiple situations. It's very much a powerful tool from a number of angles, but ease of use is definitely one of them.

In addition, it's a single platform for core architectural use cases: CDC replication, streaming ETL, and batch ETL. It also has micro-batch. It's got it all, from end-to-end. It's the glue. There are a lot of other products out there, good products, but there's always a little bit of something missing from the other products. Equalum did its research well and understood the requirements of large enterprise and governments in terms of one tool to rule them all, from a data migration integration perspective.

The speed of data delivery is super-fast. In some cases, when you look at the timestamp of data in the target versus the source, it's down to the hundredths of a second and it's exactly the same number. That's at the highest level, but it's super-fast. It's lightning fast in terms of its ability to handle data.

What needs improvement?

There are areas they can do better in, like most software companies that are still relatively young. They need to expand their capabilities in some of the targets, as well as source connectors, and native connectors for a number of large data sources and databases. That's a huge challenge for every company in this area, not just Equalum.

If I had the wherewithal to create a tool that could allow for all that connectivity, it would be massive, out-of-the-box. There are all the updates every month. An open source changes constantly, so compatibility for these sources or targets is not easy. And a lot of targets are proprietary and they actually don't want you to connect with them in real time. They want to keep that connectivity for their own competitive tool.

What happens is that a customer will say, "Okay, I've got Oracle, and I've got MariaDB, and I've got SQL Server over here, and I've got something else over there. And I want to aggregate that, and put it into Google Cloud Platform." Having connectors to all of those is extremely difficult, as is maintaining them.

So there are major challenges to keeping connectivity to those data sources, especially at a CDC level, because you've got to maintain your connectors. And every change that's made with a new version that comes out means they've got to upgrade their version of the connector. It's a real challenge in the industry. But one good thing about Equalum is that they're up for the challenge. If there's a customer opportunity, they will develop and make sure that they update a connector to meet the needs of the customer. They'll also look at custom development of connectors, based on the customer opportunity. It's a work in progress. Everybody in the space is in the same boat. And it's not just ETL tools. It's everybody in the Big Data space. It's a challenge.

The other area for improvement, for Equalum, is their documentation of the product. But that comes with being a certain size and having a marketing team of 30 or 40 people and growing as an organization. They're getting there and I believe they know what the deficiencies are. Maintaining and driving a channel business, like Equalum is doing, is really quite a different business model than the direct-sales model. It requires a tremendous amount of documentation, marketing information, and educational information. It's not easy.

For how long have I used the solution?

We've been involved with Equalum for about 18 months.

We're partners with Equalum. We resell it into Japan and Korea. The model in North Asia is a reseller/channel model. The majority of our activity is signing resellers and managing the channels for Equalum into markets. We're like an extension of their business, providing them with market entry into North Asia.

We've been reselling every version from that day, 18 months ago, up until now. We've been up to version 2.23, which is the most recent. We're reselling the latest version and installing it on channel partners' technology infrastructure. We tend to use the most recent version, as long as there are no major bugs. So far, that's been the case. We've been installing the most recent product at that moment, without any issues, and the reseller is then using that for internal PoCs or other ones.

What do I think about the stability of the solution?

The stability of Equalum is very good. We deal with double-byte character sets in Japan, so there are little things here and there to deal with when new versions come, but there are basically no issues at all. It's very solid. It's running in multi-billion dollar organizations around the world. I haven't heard any complaints at all. The stability factor seems to be very good.

What do I think about the scalability of the solution?

The scalability is the beauty of this product. It scales both vertically and horizontally. It provides ease of scalability. You've got the ability to add a CPU horizontally, in terms of its hardware servers, and you've got the ability to add additional nodes in a cluster, as well. It's really good for continuing to build larger and larger clusters to handle larger and larger datasets. It does require manual intervention to scale horizontally. Vertically it is literally just a matter of adding hardware to the rack, but horizontal scaling does take some professional services. It's not like pressing a button and, presto. It's not a cloud-based, AWS-type of environment at this time. But that's fine because sometimes, with this kind of data and these types of customer environments, you definitely want to be able to understand what you're dealing with. You've got hundreds, if not thousands, of workflows. It's not something as simple as just clicking a button.

The scalability is a really interesting component and is a reason we are very interested in working with Equalum. It's easy to expand. Obviously, we're in the business to generate revenue. So if we can get adoption of the tool inside an organization, and they continue to add more and more data to the overall infrastructure's architecture, all we need to do is expand the scalability of the solution and we generate additional revenue.

How are customer service and technical support?

Their technical support is fantastic. They have support in Houston and Tel Aviv. We mainly deal with Tel Aviv because it's a much better time zone for us here in Japan and Korea. We use Slack with them. And I was just talking with them this morning about a new support portal they've just released, that we're going to have full access to. 

We carry Equalum business cards and Equalum email addresses. We really are like an extension of the business in Asia, even though we're a separate entity. So when we communicate with the channel partners, we use Equalum email addresses and in doing so we also then answer technical support requests. Our own technical team is going to be integrated into the support portal at some point very soon so that our teams in Korea and Japan can handle questions coming from the customers in the language of the country. We'll be a second line of support to our resellers and, if necessary, of course, it's easy to escalate to the third line support in the U.S. or in Tel Aviv, 24/7.

Which solution did I use previously and why did I switch?

We looked at a few, but we haven't actually worked with another ETL vendor. We had good relations with a number, but we never actually took on an ETL tool in the past.

We dealt in other solution offerings and BI, real-time business intelligence and database infrastructure products, and we did have some interaction with them and we weren't really impressed. There wasn't anything that could really keep up with the kind of data speeds we were trying to process. I came across Equalum at a Big Data event back in 2019. I walked up to their booth and started chatting with Moti who is the head of global sales. At the time, we were looking for something along these lines to combine with our current offering.

This is the first time we have taken on a tool of this nature and it has become the core of our sales approach. We lead with Equalum because you first need to get access to the data. When you transform the data, you can land the data in a database, so we sell another database product. And after we've landed the data in the database target, we then connect it with business analytics and AI tools or platforms, which we also sell. It has become the core, the starting point, of all of our sales activities.

How was the initial setup?

The difficulty level of the initial setup depends on the skills of the engineer who's involved. It now takes our guys a maximum of 30 minutes to deploy it, but the first time they did it, it took a little bit of time to go through it and to understand how it's done. Now, it's really easy, but that first time requires a little bit of hand holding with Equalum's engineers. It's relatively easy once you go through it the first time.

We don't do implementations at the customer location. That's what our partners do. But we support that and we help them with that. Getting the software up and running has been pretty easy. The challenges are around connecting to the database of the customers and getting through VPNs and the like. That's the challenge of getting into any enterprise infrastructure.

What was our ROI?

In terms of savings based on insights enabled through Equalum's streaming ETL, we're not a customer, so we haven't seen the savings. And gathering ROI on these kinds of topics is always difficult, even if you are talking to a customer. But it all comes down to the cost of the technology and the cost of human resources to develop it, maintain it, and manage it, and the project it's being used for. But those savings are certainly the reason our resellers and their customers are looking at acquiring the tool.

What's my experience with pricing, setup cost, and licensing?

They have a very simple approach to licensing. They don't get tied up with different types of connectivity to different databases. If you need more connectors or if you need more CPU, you just add on. It's component-based pricing. It's a really easy model for us and, so far, it's worked well. 

The actual pricing is relative. You talk to some people and they say, "Oh, it's too expensive." And then you talk to other people and they say, "Whoa, that's cheap." It's a case-by-case issue.

For the most part, when we go up against local CDC products, we're always priced higher, but when we go up against big, global ETL vendors, we're priced lower. It comes down to what the customer needs. Is what we have overkill for them? Do they really just need something smaller and less expensive? If pricing is a problem, then we're probably in the wrong game or talking to the wrong people.

Overall, we feel that the pricing is reasonable. We just hope that they don't increase the price too much going forward.

Which other solutions did I evaluate?

We had looked at the likes of Talend, and Informatica, of course. There are so many local Japanese and Korean CDC products. We didn't go really heavily in-depth, looking into that sector because as soon as we saw and understood what Equalum had to offer, and we liked the people that we were dealing with and they liked our approach, they were also interested. Frankly, we didn't see anybody that had the same combination of technologies and that they weren't North Asia.

There are a lot of functionalities that are similar to other products, such as drag-and-drop capabilities and workflows, when you get into the nitty-gritty, but it's the overall package in one tool that is very powerful.

And in terms of consolidating and reducing the number of tools you use, if you look at Informatica, you need four products from Informatica to do what the one product from Equalum does. There's serious consolidation. There are definitely a lot of ETL products that don't do CDC, so you have to buy two products. The whole concept of having one tool handle most of the requirements is a strong selling point of Equalum, and it's good for customer adoption. The side note to that is that if customers have already spent millions of dollars with their current tool, it becomes difficult for them to adopt Equalum. 

What other advice do I have?

It's a great tool to use in any data transformation opportunity, especially focusing on real-time. The word "batch" should never be used in an organization, going forward. I know it's useful and it has its use cases and there are situations where it's helpful. Batch is a form of history and it will probably always be there until that legacy finally disappears. But, overall, if anybody wants to look at migrating and transforming their overall data into a real-time enterprise, there's not a better tool in the market today, in terms of its performance, price, usability, and support. Those four things are the reasons we're selling it.

The biggest thing I have learned from working with Equalum is how difficult it is to actually manage your own Spark and Kafka clusters, and to process data at speed. It's difficult to have the infrastructure and all of the other software to combine everything. In some organizations the effort takes hundreds of people, depending on the size of the enterprise, how much data is involved, and the overall system architecture. What opened my eyes was the fact that, with this tool, you have the ability to alleviate all of the potential headaches associated with developing or maintaining your own clusters of these open source products. 

Large, well-known Asian companies literally have over 1,000 engineers dedicated to managing open source clusters. Those companies are wasting so much money, effort, and brain power by having their engineers focused on managing these really basic things, when they could be deploying a third-party tool like Equalum. They could be letting their engineers drive larger revenue opportunities with more value added around things like what to do with the data, and how to manage the data and the data flow. They could create real value from integrating data from disparate data sources, instead of focusing on the minutia, such as maintaining clusters. The mindset of some of these Japanese and Korean companies is back in 1995. That's the reality. It's a challenge because getting them to change their older business approach and ideology is always difficult. But that is what opened my eyes, the fact that this tool can literally alleviate thousands of people doing a job that they don't need to be doing.

As for the data quality that results from using the tool, it's dependent upon the person who's using the tool. If you are able to transform the data and take the information you want out of it, then it can help your data quality. You can clean up the data and land it in whatever type of structure you would like. If you know what you're doing, you can create really high-quality data that is specific to the needs of the organization. If you don't know what you're doing, and you don't know how to use the tool, you can create more problems. But for the most part, it does allow for data cleansing and the ability to create higher-quality data.

When it comes to Oracle Binary Log Parser as opposed to LogMiner, my understanding is that it has higher performance capabilities in terms of transacting data. It allows for the performance of data being migrated and/or replicated, as well as the transaction processing that takes place in the database, to happen in a much more performant way. The ability to handle those types of binary log requirements is really important to get the performance necessary. Equalum has a partnership with Oracle. Oracle seems to be very open and very positive about the partnership. Although we are replacing a lot of the Oracle GoldenGate legacy products, they don't seem to be too worried because we're connecting to their databases still, and we're landing data, in some cases, into their cloud-based data infrastructure as well. There's definitely a lot of power in the relationship between Equalum and Oracle. It's a very strong benefit for both the product and the company.

From a business point of view, as a partner, we are continuing to educate and to bring resellers up to speed about the new kid on the block. It's always difficult being a new product in these markets because these markets are very risk-aversive and they don't really like small companies. They prefer to deal with large companies, even though the large companies' technologies are kind of outdated. It's a challenge for us to try to educate and to make them aware that their risk is going to be low. That's what's important at the end of the day: It's about lowering risk for partners. If adopting a new technology increases their risk, even if the performance of the technology is better, they won't go along with it, which is a very different mindset to North America, in my opinion.

Overall, we sell a combination of multiple products in the real-time data and Big Data AI space, and Equalum is the core of our offering. It literally does provide the plumbing to the house. And once you get the plumbing installed, it's generally going to be there for a long time. As long as it fits performance and as long as it continues to evolve and adapt, most companies are really happy to keep it.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor. The reviewer's company has a business relationship with this vendor other than being a customer: Reseller.
Selva Dhoom - PeerSpot reviewer
Technical Manager at a tech services company with 11-50 employees
Real User
Top 20
Automates manual activities and has helpful documentation that allows users to self-study
Pros and Cons
  • "I found SnapLogic valuable and what I found most valuable about it was its ETL feature. I also found its automation feature valuable. It can be used for automating manual activities. It can be used as a middleware for certain transactional data processing and minimal datasets and ETL activities."
  • "What could be improved in SnapLogic is that it was not capable in terms of processing a large number of datasets, but at that point, SnapLogic was evolving. It didn't give a lot of Snaps. I heard recently there are a lot of Snaps getting added and the solution was being enhanced, particularly to connect different data sources. When I was working with SnapLogic six months to one year back, I faced the issue of it not being capable of handling a huge volume of datasets or didn't have much of Snaps, and that was the drawback. If there is any large number of data sets, that's based on or depends on your configuration. If it is a huge volume of data, other traditional ETL tools such as Informatica and Talend can process millions and billions of records, while in SnapLogic, the Snaplex fails or it returns an error in terms of processing that huge volume of data. Informatica, Talend, or any other ETL tool can run for hours in terms of jobs, while SnapLogic jobs fail when the threshold is reached. SnapLogic isn't able to withstand processing, but I don't know if that's still an issue at present, because the solution is getting enhanced and it's been more than six months to one year since I last worked with SnapLogic. There are now a lot of Snaps getting added to the solution, and if it can overcome the limitations I mentioned, SnapLogic could be the go-to tool because currently, it's not being used as much in organizations. It's being used comparatively less compared to other retail tools."

What is our primary use case?

I used SnapLogic for ETL purposes, particularly for automating manual activities, for example, automating validation reports, checking and generating email, etc.

How has it helped my organization?

The business problems solved by SnapLogic involved setting manual work or manual validation that my company runs in the backend database which is connected using Snaps, then the validation is put as a query into these Snaps, which SnapLogic validates. The solution will generate the results, then sends a report to your email ID, so activities that used to be manual are now automated.

My company also used SnapLogic for ETL processing, particularly various inputs, CSV files connected with Redshift, including databases and adjacent files inputs. The solution can do file-level processing, take data from one data source, and put it into the other data source, so small ETL projects.

You can also take advantage of the SnapLogic pipelines. When a pipeline is created, you can get the URLs from that pipeline which can be triggered from AWS Lambda or any other triggering service. A vendor will provide a set of files that need to be processed. It's not a huge file as it's like transactional data that a vendor provides every week, so my company creates a pipeline where the team configures data in Lambda, so whenever a client drops that particular file for processing, the URL gets triggered, and the SnapLogic job will also be triggered. The SnapLogic job will pick the file from AWS from three locations or any source, then it does the transformation and loads it into the Redshift database of my company, and then the data will be used for several downstream and reporting purposes.

ETL is the major purpose for SnapLogic in the company where it does the transformation by fishing data from one source and putting it into the other. Automation is another purpose.

What is most valuable?

I found SnapLogic valuable and what I found most valuable about it was its ETL feature. I also found its automation feature valuable. It can be used for automating manual activities. It can be used as a middleware for certain transactional data processing and minimal datasets and ETL activities.

What needs improvement?

What could be improved in SnapLogic is that it was not capable in terms of processing a large number of datasets, but at that point, SnapLogic was evolving. It didn't give a lot of Snaps. I heard recently there are a lot of Snaps getting added and the solution was being enhanced, particularly to connect different data sources. When I was working with SnapLogic six months to one year back, I faced the issue of it not being capable of handling a huge volume of datasets or didn't have much of Snaps, and that was the drawback.

If there is any large number of data sets, that's based on or depends on your configuration. If it is a huge volume of data, other traditional ETL tools such as Informatica and Talend can process millions and billions of records, while in SnapLogic, the Snaplex fails or it returns an error in terms of processing that huge volume of data. Informatica, Talend, or any other ETL tool can run for hours in terms of jobs, while SnapLogic jobs fail when the threshold is reached. SnapLogic isn't able to withstand processing, but I don't know if that's still an issue at present, because the solution is getting enhanced and it's been more than six months to one year since I last worked with SnapLogic.

There are now a lot of Snaps getting added to the solution, and if it can overcome the limitations I mentioned, SnapLogic could be the go-to tool because currently, it's not being used as much in organizations. It's being used comparatively less compared to other retail tools.

For how long have I used the solution?

I've worked with SnapLogic for over a year, but it's been a few months since I last worked with the product because I've switched companies.

What do I think about the stability of the solution?

SnapLogic is a stable solution, particularly for scheduled activities. You can schedule activities based on timing or triggers and the solution functions well. SnapLogic, in terms of performance, was good for minimal datasets or transactional datasets. It runs in a few seconds and depending on the volume of data, a job runs for hours, but when there's a huge volume of data, the job sometimes fails.

What do I think about the scalability of the solution?

The cloud version of SnapLogic is scalable, but I haven't personally tried scaling it.

How are customer service and support?

I have never reached out to the technical support for SnapLogic because I only reached out to the administration team of my company who reach out to the SnapLogic support team.

How was the initial setup?

The setup and administration for SnapLogic were taken care of by a different team, but in terms of development work where the solution was involved, we can prepare pipelines and from a lower environment those can be moved to other environments, so deployment-wise, it was quite easy particularly when we can export the pipeline into an XML file and upload it into the other environments. That portion was easy and won't take up much time. Deployment activities for SnapLogic are quite easy and doable and won't take much time depending on the number of pipelines in the project folder.

What's my experience with pricing, setup cost, and licensing?

I'm not aware of the pricing for SnapLogic because I'm just an application developer.

Which other solutions did I evaluate?

I evaluated Informatica and Talend.

What other advice do I have?

I'm not sure which cloud provider was used by my company for SnapLogic, as that was maintained by a different team. I'm just an application user.

SnapLogic requires maintenance, but the administration work or maintenance work was done by a separate team, so I'm not quite clear in terms of maintenance. There are releases I'm aware of, where I would get a notification that the product would be offline for some time because the team will be doing certain enhancements. The administration and maintenance team is given information on the new releases for this version, so my team, particularly the application development team, would test it and make appropriate changes if required. Sometimes there's a scene where one snap was getting removed and you need to replace it with a different snap, particularly a newer release or version. SnapLogic has regular maintenance, release, or version handled by a different team within my company.

There are approximately thirty to fifty users of SnapLogic within the company. There's a team dedicated to administration activities in terms of deployment, maintenance, release activities, and providing access. There's a team of application developers which I'm part of that uses the tool to create pipelines, work on SnapLogic to develop apps, and do ETL transmission.

I didn't work with the solution daily. I only worked with it whenever required.

SnapLogic is used comparatively less by multiple organizations. I would opt for other traditional tools that have a lot of other features, which have a built-in cache for faster performance and can process huge ETL load volumes, such as Informatica, Talend, including PySpark. I would rate SnapLogic less.

My advice to people looking into implementing SnapLogic is that if you have basic ETL knowledge, you can do self-study on this. There is also documentation for every snap, so even if you don't know how to build pipelines or how to work on this tool, the documentation would be really helpful. There's a help button in the tool that when clicked, directly takes you to that particular snap's documentation. You can self-study and you can create it yourself. External training is not required, and that is one pro about SnapLogic. The UI portion is also good. There is a separate window for pipeline creation and you can go to the Snaps and folder areas separately.

I'm rating SnapLogic eight out of ten.

My company is a customer of SnapLogic. It is an AWS partner, but not a SnapLogic partner.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Bruno Ramos - PeerSpot reviewer
CEO and Founder at HartB
Real User
Top 5
Improved our time to implement a new ETL process and has a good price and scalability, but only works with AWS
Pros and Cons
  • "The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features."
  • "The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."

What is our primary use case?

It is a good tool for us. All the implementation in our company is done with AWS Glue. We use it to execute all the ETL processes. We have collected more or less five terabytes of information from the internet by now. We process all this data in our cloud platform and normalize the information. We first put it on a data lake that we have here on the AWS tool. After that, we use AWS Glue to transform all the information collected around the internet and put the normalized information into a data warehouse.

How has it helped my organization?

It has improved the time to implement a new ETL process by 30%. We have also seen a big improvement in the data science area.

What is most valuable?

The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features.

What needs improvement?

The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS.

For how long have I used the solution?

I have been using this solution for two years.

What do I think about the stability of the solution?

In terms of stability, we had some problems in the past, but now, it is okay. AWS provides SLA, and the integration of the tools is good.

What do I think about the scalability of the solution?

Scalability is a very strong point of this solution as compared to other solutions like PowerCenter and Pentaho. In Pentaho, you need to install a lot of machines, but in AWS Glue, you just need to find out how many instances do you need. You just put this information in a form and click okay. Magically, you have the scaled processes. 

We have 35 users of this solution, and they are engineers, DevOps, and data scientists. We have a lot of plans to increase the usage of AWS Glue in 2021.

How are customer service and technical support?

In the first year of using it, we had a lot of problems with the solution. Our team found more or less five bugs if I remember correctly. Our experience with AWS support was very good. The team in the US helped us to resolve the problems and fix the bugs. We are AWS partners.

Which solution did I use previously and why did I switch?

Before AWS Glue, we worked with Talend, PowerCenter, and Pentaho. In the case of PowerCenter, the biggest problem for us was the plugins because they were too expensive. That was the negative point of PowerCenter. 

In the case of Talend, the problem was that in Brazil, we didn't have professionals with the skills to work with Talend. In addition, we had to use the command-line interface, which was a terrible thing because it took more time as compared to other solutions.

In the case of Pentaho, we had the same problem as Talend. We didn't have a lot of professionals. Of course, we have some courses to train people in Pentaho. We work with the biggest companies in Brazil, and we need professionals every day, but we don't have professionals with experience in Pentaho.

How was the initial setup?

The initial setup process is totally easy. You just need to put some information in the forms, and then you just need to click some buttons, and it is complete. The process to provide a new infrastructure with AWS Glue takes from 10 minutes to an hour.

What about the implementation team?

We have all the professionals inside the company.

What's my experience with pricing, setup cost, and licensing?

Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is also good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients.

In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend.

What other advice do I have?

I would rate AWS Glue a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Buyer's Guide
Integration Platform as a Service (iPaaS)
July 2022
Get our free report covering Microsoft, Amazon, Talend, and other competitors of Talend Cloud Integration. Updated: July 2022.
621,593 professionals have used our research since 2012.