What is our primary use case?
We have roughly 8,000 jobs that run every day and they manage anything from SaaS to Python to PowerShell to batch, Cognos, and Tableau. We run a lot of plans that involve a lot of constraints requiring them to look at other jobs that have to run before they do. Some of these plans are fairly complicated and others are reasonably simple.
We also pull information from SharePoint and load that data into Greenplum, which is our main database. SharePoint provides the CSV file and we then move it across to Linux, which is where our main agent is that actually loads into the Greenplum environment.
Source systems acquire data that goes into Greenplum. There are a number of materialized views that get populated, and that populating is done through ActiveBatch. ActiveBatch then triggers the Tableau refresh so that the reports that pull from those tables in Greenplum are updated. That means from just a bit after source acquisition, through to the Tableau end report, ActiveBatch is quite involved in that process of moving data.
We have 19 agents if you include the Linux environment, and 23 if you count the dev environments. It's huge.
It's on-prem. We manage the agents and the scheduler on a combination of Windows and Linux.
How has it helped my organization?
We have some critical processes in ActiveBatch that go to finance and to the auditors in our organization. Those processes are highly critical because that allows us to trade. If those reports don't get to them, we get penalized by the government or by APRA or by some financial institutions. ActiveBatch, in this particular case, is absolutely critical for getting those reports out.
We have SLAs requiring us to get reports out by a certain time of day or by a certain day of the month, by a certain time. We're judged on whether those reports go out. ActiveBatch, being as stable as it, is only impacted by external factors like the network and database performance. But otherwise, we are quite comfortable with the way ActiveBatch is able to handle these jobs without our having to look at them.
Because the connections between ActiveBatch and other tools are automated, it gives us more time to do other things, and more interesting things. If something goes wrong, we can go back and have a look in the logs that are produced and that explain what's going on, and we can then repair it. It's an enabler, and it provides us with more time to get on with other jobs. It's something that's critical and it runs by itself and we're really happy it does that. We have that time available because we're not actually manually babysitting processes.
It provides a central automation hub for scheduling and monitoring, bringing everything together under a single pane of glass, absolutely. There is finance, sales, marketing. Pretty much every department has a job that we deal with. It's quite heavily integrated into our whole stack. As an insurance company, our major events department, for example, is critical because every time there's a storm or a hail event or a cyclone somewhere, those reports must get out in a timely manner. I can't think of any department that isn't impacted by ActiveBatch, running some report for them.
The single pane of glass helps the DataOps team manage all of the processes that are supported by ActiveBatch as the main scheduling tool. We've created a dashboard which pulls information from ActiveBatch, information that we can share with the organization. They can look at jobs and the schedules and, if necessary, run their own jobs from that point. It's like the lungs of our company.
Overall, it has helped to improve workflow completion times by 70 to 80 percent, easily. Once you've built a job, it just runs and no one has to concern themselves with it doing what it's doing. They will get the notification or the file or the email that says it's processed and they move on with their day.
In addition, we had a guy who was spending seven hours in a week to extract, compile, and then export information into a CSV file, and then another few hours to get it transferred to another department. We were able to build a PowerShell script, with a query that could easily be updated, that was automated through ActiveBatch. It takes 10 minutes to run. What that guy was doing in hours, we are now doing within minutes.
What is most valuable?
One of the valuable features is the ability to tie ActiveBatch into other applications using API calls. The native integrations and REST API adapter for orchestrating the entire tech stack are very good and user friendly. We have a product called ServiceNow, which is a call tracking system. If a problem occurs, ActiveBatch will send an API call into ServiceNow, and it will raise a ticket to say that there's a problem. That gives us an auditing process. We're also using API calls for Tableau and we're also using some API calls for SharePoint. We tie ActiveBatch into a lot of different applications.
Also, the overall ease of use is brilliant. It's easy to pick up. We can get a newbie up and running within a day, using ActiveBatch. It's not to the extent where that person will know some of the more complicated issues, but in terms of being able to build a job and export or run the job, it's within a couple of hours. Within a day, people are quite comfortable with the application. We've just signed an agreement with ActiveBatch which gives us all the education materials now. That means we'll be applying more advanced features. It's really good as far as ease of use goes.
We use the solution across all sorts of organisational branches. It's used for SaaS and SAP, which is finance. We have fraud and Salesforce, which is for the sales group. It's also used with marketing and major events because, when there's a storm, we need to know what's going on. We also have the ability to pull from external sources, meaning external vendors such as Guidewire. So ActiveBatch is widely utilised and probably more widely utilised than the executives realise. It's well embedded in our company.
What needs improvement?
We have moved to version 12, and I believe that interface is more of a "webbie" look and fee.
A nice thing to have would be the ability to comfortably pass variables from one job to another. That was one of the things that I found difficult. Other than that, it's all good.
For how long have I used the solution?
I've been with this company for over 10 years and it was already here before I arrived.
What do I think about the stability of the solution?
The most valuable feature is its stability. We've only had very minor issues and generally they have happened because someone has applied a patch on a Windows operating system and it has caused some grief. We've actually been able to resolve those issues quite quickly with ActiveBatch. In all the time that I've had use of ActiveBatch, it hasn't failed completely once. Uptime is almost 100 percent.
With those 8,000 jobs that run in a 16-hour period, the majority of the time we're spending about an hour of the day with ActiveBatch, repairing problems. There are issues where we have to re-run a job because of it exceeding its runtime. Or when a job fails, even though the alert goes out to the end user, we still have to tap the user on the shoulder and say, "Did you look at this alert? We've got a problem here, can you please fix it?" Other than that, it pretty much runs itself. Overall, ActiveBatch saves us a huge amount of time, being as stable as it is.
If we were having to repair everything, on an ongoing basis, we would be spending more than five or six hours a day, so we are saving at least five to six hours a day by using this tool. The improvement to the business is quite substantial. People aren't having to manually do anything that would normally take them two or three hours to do. Those things are being done within a matter of minutes and then passed on. And those five or six hours are just for us in our department. You can multiply that by the number of people who would normally have done something manually and who now have it done through ActiveBatch in minutes.
We're looking at more than a 98 percent success rate for uptime and for running jobs. The only time that something falls over is not to do with ActiveBatch itself, rather it's to do with problems with either the network, the database, or developers.
What do I think about the scalability of the solution?
The scalability is brilliant. We've got 23 machines. We have redundancy integrated into this environment.
If a server goes down, we can turn that queue off and re-queue those jobs to another server, while we get a new image spun up and restarted. In that situation, the delay is in getting the IT guys to spin up the image. If we could get an image spun up when it failed, it would be a matter of five or 10 minutes to be back in business with that server. As it is, once the IT guys do spin it up, we kick off from there.
The main interface is used by about 12 people. The dashboard that we've built on top of it is probably used by 70 to 80 people. But the number of people it affects is in the thousands across the entire organization.
It's heavily utilized across a number of departments in the organization and they really do rely on ActiveBatch to stay up and stable and to provide their reporting mechanisms.
How are customer service and support?
We've had a couple of issues where we've had to log a defect with ActiveBatch. But the guys at ActiveBatch are really responsive. We had things fixed in 24 hours, and they're in a different time zone. The response time is exceptional. This is one of the few vendors that I can say is highly responsive and that shows a level of commitment that I don't think many other organizations show.
How would you rate customer service and support?
Which solution did I use previously and why did I switch?
ActiveBatch replaced Windows Scheduler, Chrome jobs that had been running on some servers. There was also another scheduling tool that popped up somewhere but that data was moved into ActiveBatch. The scheduling from Cognos was also moved into ActiveBatch because it was more convenient, and some of the Tableau scheduling was moved into ActiveBatch as well.
How was the initial setup?
The initial setup was straightforward. It's super-easy to install and super-easy to set up. Even on the Linux box, it was really easy to install and set up and run. There was no real complexity in the installation process.
Most of the time with setup or upgrades is spent testing. We usually deploy agents within 20 minutes. The scheduler and the database might take an hour and a half, but because the agents are on virtual machines, we have an image and we just spin that image up. If something goes wrong, we can just spin up a new image and get that agent started straight away. In terms of testing, when we do disaster recovery, we redeploy to a disaster recovery environment and then we test that the connections are working, the jobs are running, and that there are no problems. That's where most of the time is spent, not in the deployment itself.
We usually have two people involved in the process, one who is the primary and one who is the secondary. And then we have a couple of people on standby. The primary does the installation and the secondary is looking over their shoulder for learning purposes. Then we have a few people on the IT side in case there is a problem with the operating system or the network that we have to deal with, but they're not involved until there's a problem. The DBA is also on-call just in case there's an issue with the database.
Maintenance-wise, it's only if something happens that we go and look. We have a job that looks at the health of the database that ActiveBatch uses. It's pretty much all automated, so it looks after itself. We have another job that pings the servers to make sure that all the ports that it needs are running and open. We also have jobs that look at the network latency so that if the network latency is beyond a certain point, it notifies IT and us. It also looks at the operating system and the actual directories. Unless we schedule it for an upgrade, which we do every six months, we don't look at maintenance for that six months unless there's a problem.
What about the implementation team?
ActiveBatch has been implemented in-house.
What was our ROI?
It pays for itself because it gives the DataOps team more time to be involved in other projects. It allows the organization to move forward without having to worry about doing anything manually. ActiveBatch is performing a huge service to the organization in terms of reducing the number of man-hours required to do manual tasks.
What's my experience with pricing, setup cost, and licensing?
If you compare ActiveBatch licensing to Control-M, you're looking at $50,000 as opposed to millions.
Which other solutions did I evaluate?
ActiveBatch isn't the only scheduling tool that we have. There's also a product called Control-M, but control-M is a lot more expensive and mostly manages mainframe. ActiveBatch is at a very modest price for running a very complex process.
We can expand ActiveBatch more readily than Control-M because, with Control-M, you pay for X number of runs in a run book. If you want to extend that run book, they want half-a-million dollars, or more, for 500 jobs. We can expand ActiveBatch. We could go to 10,000 jobs and it wouldn't cost us any more. It's only if we were to add more agents to load balance that we would be charged any more, and it wouldn't be anywhere near what Control-M charges.
I've mainly been involved with ActiveBatch and it's hard to compare another vendor when there hasn't been a vendor to compare against. As far as performance is concerned, Control-M and ActiveBatch are on par, but they're not the same because Control-M is really just moving files and running programs on mainframes, whereas we're running against Windows and Linux environments.
The other one that's being utilized at the moment is Apache Airflow, but that's more for the developers because they like to be able to program the backend, rather than to use a frontend interface. We've been looking at how that works, but we haven't seen it to be very stable for a production environment. You can't compare Airflow with ActiveBatch, in effect.
What other advice do I have?
My advice would be to jump on it straight away. With the ease of installation, the expandability or scalability of the product across multiple servers with different agents, the ability to not only use Windows but Linux as well, and the fact that you can build complex plans that have multiple constraints, multiple types of scheduling, and multiple types of alert mechanisms, it's highly expandable. You're going to have a lot of fun with it.
It's highly flexible and easy to use. In terms of what we can do, we still haven't gone to the Nth degree of what we can't do with ActiveBatch. It's incredibly flexible. We're running shell scripts that run Python scripts. We've got PowerShell scripts and batch scripts. We tie into different applications. We still haven't exhausted the potential of ActiveBatch. That's what I've learned.
Predictability is something that is out of the control of ActiveBatch. We can set a job to run against a database, but it's really going to be the network or the database that will impact ActiveBatch. ActiveBatch will continue to run. There is an average run time that we look at, but if the network has high latency or the database is under load, the time will increase. ActiveBatch will continue to run as normal. The frequency of ActiveBatch failing is quite rare.
We use the ActiveBatch interface up to a certain point, and then we start looking at running Python and shell scripts. That's why we have the Linux agent. We call a shell script which runs a Python script that does some manipulation and passes that information back. And then there are a number of plans that manipulate the process. In this particular plan, the CSV file is created and it's dropped into a file location. ActiveBatch is polling for that location. It sees that file. Then a Python script runs and creates an MD5 hash. When you download a file from the internet, there's an alphanumeric number that indicates whether that file is valid or not. The MD5 hash is generated on the file and when it's moved to another location, another MD5 hash is generated to determine whether there was a change in that file when it moved from A to B. It's a validation to make sure that no data was corrupted during the movement from where the file was dropped to where the file landed. Once it has been validated, the file is then moved into another location where it's uploaded into the Greenplum database and a notification is sent to whomever was involved in that particular process. It's quite involved.
If a job fails, we have set it to wait for a few minutes and to then re-run. If that fails, we can trigger another job to continue on in that process flow, if the failed job isn't critical. Some of the plans are quite complicated and have a certain amount of logic involved, but that enables us to navigate around problems that might otherwise need a developer's assistance, if it doesn't affect the overall plan process. As long as there are no constraints involved that require the next job to run, and it can move around that job and continue on, that's how we set it up.
We're looking forward to version 12 to see how that goes as well. We've also mirrored the database, the backend database that ActiveBatch uses. We have a failover process which was just recently installed. If one database fails, we can switch over immediately to the other database in real time.
Overall, we're really comfortable with how ActiveBatch is performing and with what it's doing.
Which deployment model are you using for this solution?
On-premises
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.