We were looking for some ETL tools.
Read reviews of StreamSets alternatives and competitors
Software Engineer Specialist at a energy/utilities company with 1,001-5,000 employees
Provides a self-managed, self-healing system where I don't have to do many actions
Pros and Cons
- "All our architectural use cases are on a single platform, not multiple platforms. You don't have to dump into different modules because it is the same module everywhere."
- "I should be able to see only my project versus somebody else's garbage. That is something that would be good in future. Right now, the security is by tenants, but I would like to have it by project, e.g., this project has this source and flows in these streams, and I have access to this on this site."
What is our primary use case?
How has it helped my organization?
We were writing data from one source to target using 5,000 websites. No matter how genetic you make it, it is never genetic. Also, some things will change requirements-wise. A tool should be easy for a support group to support. They do not need to have access to Linux or wherever Python scripts are running to figure out how to do the logging. Now, I am just opening a box to them, and saying, "This is what you need to do in Equalum." They have to just point and click, which is operational efficiency. It increased my efficiency by 80 percent by wrapping things around it, e.g., the callable API from my application. It really helps with building, support, and monitoring. It is all in the UI.
It has improved our data accuracy. It tells you where things are not matching. For example, bad dates were coming in and the target database would not accept this format. So, Equalum will tell me if there is a problem over there. For error logging and error messaging, it is very efficient. It tells you what the problem is, e.g., your data type is not long enough on the targets. The logging is efficient, very detailed, and will also tell you where the problem is. You can fix the data, transform it, or change the target to accept that type of data. The accuracy is 100 percent. I have not seen any data anomalies.
I do batch loads and write them to temporary/work/staging tables. From there, I want to write them to the real table. When I go over the network with a million letters, if I'm writing to a table by deleting the data and writing it again, that might take two to three minutes. My data will disappear for that time. However, if I am writing from a staging table to another table in the database, like MemSQL, it takes only a few seconds for me to write within MemSQL itself. So, my data disappearing is minimized. For that, I am required to follow a procedure to move the data from the staging table to the final table, and they added that functionality for us by integrating with other database technology and functions. This is one example of integrating other things into the tool.
What is most valuable?
Their performance monitoring (how things are flowing) is very visual, if something is failing, you can see it in there at the higher level, e.g., your sources are down, your agent is down, or your flow is not running. All those kinds of things are very visual. You just log into the tool and can see what is happening.
The alerting system is getting better with every release.
It takes me an hour to transition the solution's knowledge to somebody else. It is really efficient that way. I haven't seen any complications.
All our architectural use cases are on a single platform, not multiple platforms. You don't have to dump into different modules because it is the same module everywhere.
It is a self-managed, self-healing system. For example, I have been getting alerts on CPU usage. They say, "CPU usage is high." Then, it sends me a warning or critical alert within seconds so I can see that it has been resolved. For example, if the database goes down, then it stops at that point, keeps on trying until the database comes up, and begins to heal itself. So, it is self-recovering and self-healing. It is the same for the target. If the target goes down, it sends you an alert saying, "The target has gone down." I don't worry about it. I can ignore the alert, because when Target comes back, it starts all over again. I really like self-recovering and self-healing because I don't need to take many actions.
Initially, I wanted the incremental data loss too and batch. So, they put those together very fast for our security. Initially, it was very basic, then they enhanced it to the level that we wanted it. So, they came back with a solution quickly. Alerting is another feature that they put together, but we did not ask for that.
In multi-tenant architecture, if I am in finance and another department is on the operations' side, then we don't have to go into each other's area. We can have our own separation of products, which is pretty cool.
What needs improvement?
When we bought the tool, some of the features were missing, but we knew the power of the ETL was very good.
Initially, I wanted scheduling within the tool itself, which is not there. However, I am using another open source software scheduling tool, Rundeck, which calls the Equalum APIs and runs them on the Equalum server. That was a workable solution for me to schedule the data loads. I wanted something with a UI interface where I could schedule within the tool, which is an improvement point for them. Right now, I have a workaround.
For any application that you start, if it doesn't have a feature but is integratable with other applications, then it is a good tool. We are working with DB2, and there are some roadblocks there for us, but we are working through those.
I should be able to see only my project versus somebody else's garbage. That is something that would be good in future. Right now, the security is by tenants, but I would like to have it by project, e.g., this project has this source and flows in these streams, and I have access to this on this site.
Something not in the tool is a CLI interface. The interface is not open to everybody. It is very restricted to admins, a DBA, like me.
For how long have I used the solution?
I have been using it for two and a half to three years.
What do I think about the stability of the solution?
Knock on wood, it doesn't break down that much unless the network goes down, and there is nothing you can do about that. So, I have not seen any problems so far.
They do different loads now:
- Direct source
- Others go to Kafka Architecture and keep on getting the data from there.
Kafka Architecture also provides stability.
What do I think about the scalability of the solution?
We started with a three-node cluster. As we are growing, we are seeing so many flows coming through. We are hoping to extend it, maybe creating another cluster or set of clusters. When it comes to scalability, we haven't done it yet. We have so many projects in there with a lot of data going back and forth. However, we are thinking about it as we grow. It should not be an issue.
On my team, we are only two to three people who do streams and loading for their own projects and systems. I have one DBA who helps me out on the SQL Server side. We have 10 divisions, and in those 10 divisions, I have the same flow names and table names. Everything is the same, except the schema is different. They gave us the specs to copy over, so I have saved time. So, whenever a new division comes in, we just copy over and replicate. This creates the flows and streams from the background on the command line interface.
It is being used for around seven projects: document management system data, financial data, and sending data to other systems or different projects.
How are customer service and technical support?
Their technical support is amazing. They are on Slack all the time. We check with them and send messages. They have this site where you can see the tickets. For example, if a flow keeps on failing, then I just send it to them, asking, "It is failing. What is the reason?" Most of the time they can come up with the error message regarding the issue, then either we can fix it on our side or they fix it on their side by finding the issue. Their support is 24/7.
They are very technical, even the customer support who is not just taking your requests, and saying, "Let me go back to the technical team, then I'll come back to you." They are the point person who knows everything about the tool technically, in and out. If they can't, then it goes to the technical team. That is the good part.
The way that they are growing is pretty efficient for us. They come and ask, "What else do you want?" Then, we tell them, "This is what we want." Initially, for example, security was not there, but they developed a patch for us, because that was a concern. Also, they develop alerts very fast.
I keep on sending them issues every now and then. They are a very smart group of people and developers who know what they are doing, which is a good thing. They know if somebody is asking for something that it is a good collaboration for them.
Which solution did I use previously and why did I switch?
I have extensive experience with ETL tools, starting from Oracle Warehouse Builder to Informatica. So, we were looking for something that can do change data capture (CDC) for us.
Previously, we did not have an ETL tool.
How was the initial setup?
The ETL deployments were very straightforward.
The PoC was good. The changes came in right away. Then, we could start using the tool, which was installed on our system.
It is not that bad to set up a flow from a table. If I had to copy 10 tables from a source to a target, then I would have to prepare 10 flows. So, they came up with something called schema Replication Groups. So, I can go in a schema, and say, "I want to copy over that application group for 10 tables." Then, it will create the steps on the target, tables, etc. I can modify those tables, if I want to, and map them again. This makes it more efficient for you to copy data to multiple tables at that time .
I am finding it very efficient for my team. When it comes to device usage, people need to move from the old architecture to new architecture, which is a big effort, and that will take time. We like the solution, but we cannot just stop operations which are happening and move onto this tool. Eventually, that is a direction that we will go.
What about the implementation team?
With a very small footprint, all we had to do is put the hardware together for them. They installed everything for us, which was very convenient for us.
They did the deployment overnight. We didn't do it. We just had to set up the servers, authorizations of the server, and the three-node cluster. After that, they took it over. So, they did that stage first, then they did production after that.
When they came, they did CDC for us, but usually requirements and even more than CDC. For example, if you are in an old lab type of environment, CDC is not the most efficient solution. You want to do some queries based on certain criteria to get incremental loads. That is something that they developed for us pretty quickly, within a month or so.
Equalum has access to our structure. They upgrade and maintain the solution for us.
Equalum team handles the deployment completely for us. Once they have deployed it to staging, we do some testing, which requires a day or so from us. We want to make sure before we approve it for production that everything works in staging.
What was our ROI?
Cost-savings-wise, if I had to do all the Python scripting and using libraries myself, then there would be so much research involved. I might need a few more developers to do it. Now, with Equalum, one person can just keep on creating streams and loading.
It does save you time and money in so many ways at the end of the day. Equalum gave us a solution to copy over and change a few things here and there, instead of creating and developing everything each time a new division joins the company.
What's my experience with pricing, setup cost, and licensing?
Equalum was reasonably priced. It is not like those million dollar tools, such as Informatica.
If you want to buy it, watch Equalum's YouTube videos first.
Which other solutions did I evaluate?
We tried a few other ones before we got into Equalum, like Striim and StreamSets. The reason that we were looking at small companies or products, which are not too expensive, was because we adapted some technology called MemSQL. This is a database technology, and we were trying different products with smaller vendors or product owners who were willing to work with us to change their tools to step into our organization. So, we tried that, but the problem was we were getting data from Oracle to MemSQL, and those other tools could not perform on CDC from Oracle. It was super slow because they are not mature yet. So, we tried that, then left it because it was not working for us.
There were some solutions that we didn't want to touch, like Informatica, which are enormously expensive. You had to really plan, budget, and get that approved.
We wanted a small solution to just get more data over here and there. I am still writing the scripts left and right. Maintaining tests was another issue. The scripts are open too, where people can access them. If they get broken, you never know who broke it. That manageability is a big thing for me.
Equalum was another one tool that I found. As soon as we contacted Equalum, they wanted to come and do the free PoC for us, going right to MemSQL from Oracle. So, they came for a week, staying online. They worked with our system, proving the solution will work for us.
Equalum's performance and UI were good at the time of evaluation. Other products did not have such an efficient, good UI. There were quite a few things which impressed our management and directors of operations. Everybody thought this was the tool for us to try.
Other tools were like a black box for me, which was not the case with Equalum.
What other advice do I have?
Go for it. It is a good tool. They are growing. Hop in now and take advantage of their pricing. The tool is worth it
It is a very intuitive tool. You don't have to do much in it, just map the things and it works.
I can use Equalum's API in other tools.
They have implemented MongoDB.
Initially, they had the Oracle CDC using Oracle LogMiner. Then, they came out with Oracle Binary Log Parser, which was super expensive: Same as Informatica. They were charging for everything, even the PoC. At that time, we were saying, "Equalum, you should have binary reads too." However, when you have a scope of a project for a growing product, you have to prioritize things.
They are now coming out with the latest version Oracle Binary Log Parser, which they installed for us. The next version will be even better. It has very good collaboration with their clients. It is not just Oracle Database features that they are putting in. I think pretty much every other client gives them requests, then they put them in the priority list and they keep on growing with them.
They just introduced Oracle Binary Log Parser. Two to three months back, we tested it. It is faster than LogMiner by 30 to 40 percent, which is an improvement time-wise. We have not implemented it yet. I would like to implement it for any new project. I have to find time to do that. I haven't worked on changing the existing one from LogMiner to Binary Log Parser. I have to work with Equalum on how redo all of them or how we can switch over to Binary Log Parser. It is not the highest priority, but if tomorrow I have to do a new project, then I would go with Oracle Binary Log Parser.
There is a lot of promise in the future, which is something to think about. We do plan to increase usage. We have a lot of projects coming up. I want to experiment with it in more ways, like with Kafka as a source and as a target where we can distribute data to multiple applications. There are quite a few things in our pipeline. It is just finding time to figure out how we can do them. We have not fully explored this tool yet. It has a lot of potential, especially the transformation and creating workflows, which has very simple data replication to the target.
Overall, I would rate Equalum as a nine out of 10.
Which deployment model are you using for this solution?
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.