What is our primary use case?
We first receive lots of data in a flat file propagated or consolidated by one of our partners from all the operational data sources and Supply chain.
What is most valuable?
It's a service model, and we only pay for what we need rather than the server sitting in the corner that may or may not always need three environments.
What needs improvement?
We're only considering enhancing the presentation layer to give a more multidimensional OLAP view that AWS seems to have decided on. Redshift with the data mart structure is like an OLAP cube. Oracle Analytics Cloud is an over-code killer and is not what we need. I was looking at Mondrian, which used to be part of the open-source stack from another vendor that works. Still, I am also looking at some of the other OLAP environments like Kaiser and perhaps decided to go to Azure with Microsoft Azure analysis cloud, but that's not multidimensional either as SSAS used to be.
We tried the Mondrian, and that didn't perform how we expected. So, we are looking at resetting something to perform as an OLAP in the cloud, particularly AWS, so that we might consider an Azure solution.
For how long have I used the solution?
I have experience with AWS Data Pipeline.
What do I think about the stability of the solution?
It is a stable solution. We have had no problem with AWS because AWS Glue's service is fine and robust for its Data Pipeline. We've never had any problems with it, and we've been there over the last year or so.
What do I think about the scalability of the solution?
It is a scalable solution. I set it up as a total serial. For example, if each job took a minute, and I have a hundred jobs, it takes, or it used to take a hundred minutes to get them all done. We weren't considering parallelism previously, but now we've gone to parallelism. They execute all those hundred jobs in fifteen minutes or less.
The scalability and throughput are fine, and that's what we were looking for because the way that our company gets its data is anytime between midnight and ten AM for most of the data, so we need to sort of frequently update it so that we get closer to the ninety or four nines of the data so that the data is valid enough to make some corrections or estimation or plans.
A lot of people are using AWS Data Pipeline and put into the Redshift data warehouse and data ops. The user community of fifty to sixty members gets the reports directly from Excel connections to Redshift or through Tableau holes.
How are customer service and support?
Once when the device was having self-stopping issues, I contacted the technical support team. The more the team tried for a solution it said that the solution or the feature was available in the latest version. For instance, if you have Glue 3.0, you can do that. So one of the glue jobs hadn't been updated to 3.0 and hence the problem.
The technical support team is good. So when you find somebody and have a specific question, they are good, but when you have a general query, then you might not get what you expect. I rate the support a nine out of ten.
How would you rate customer service and support?
How was the initial setup?
The way we do it in our organization made it more complex because the infrastructure as a code using Terraform is good but rigid and expects a certain skill level to use it more elaborately. Still, we're managing to do that so we don't have just people going into a console creating services and controlling that. And then, with the actual Glue code, along with AWS, we will work out a way of making a full CI/CD DevOps approach to do check-in code, do pushes, and do for migrations graciously. It's almost semi-automatic because you must review and approve code push, which works well. Still, we had many problems getting there during the deployment process, but we got there.
AWS Data Pipeline is manual. So there's a very lean team of only five members. And most have the experience of how it's been designed and built. That is how the maintenance happens.
What's my experience with pricing, setup cost, and licensing?
The way we use it, I think it is fair as we're getting a good value for money compared to having a server or some other data pipeline. Azure, I think, would work out more expensive.
It is a good solution as I do not have to specify a large resource. It gives a good run and takes fifteen minutes to get to the pipeline.
What other advice do I have?
Since we took what Glue normally is to make it simpler, I rate the overall solution a nine out of ten.
Disclosure: I am a real user, and this review is based on my own experience and opinions.