Share your experience using AWS Data Pipeline [EOL]

The easiest route - we'll conduct a 15 minute phone interview and write up the review for you.

Use our online form to submit your review. It's quick and you can post anonymously.

Your review helps others learn about this solution
The PeerSpot community is built upon trust and sharing with peers.
It's good for your career
In today's digital world, your review shows you have valuable expertise.
You can influence the market
Vendors read their reviews and make improvements based on your feedback.
Examples of the 84,000+ reviews on PeerSpot:

Senior Director Data Architecture at Managed Markets Insight & Technology, LLC
Real User
Top 5
A tool with great orchestration and development capabilities but needs to improve its user-defined functions
Pros and Cons
  • "The most valuable feature of the solution is that orchestration and development capabilities are easier with the tool."
  • "The user-defined functions have shortcomings in AWS Data Pipeline."

What is our primary use case?

In my company, we use AWS Data Pipeline along with AWS Step Functions so that we can orchestrate the whole ecosystem via AWS Step Functions.

We use AWS Data Pipeline to transport data from one location to the other. In our company, we also do data engineering workloads as part of AWS Data Pipeline.

You get an email that comes into AWS SQS, and then it triggers Lambda, which kicks off Step Functions that then use the content of the request and do things considering whether it has an attachment that it has to process, making all of it as a use case of the product. You also have AWS Batch, calling and scraping content from the web and putting it in S3, which actually triggers AWS Data Pipeline and consumes that web content. API calls in AWS provide data and get the response from expert AI while doing more aggregates and purchasing the data back to S3.

What is most valuable?

The most valuable feature of the solution is that orchestration and development capabilities are easier with the tool. It provides out-of-the-box functions, and you don't have to write a lot of code while using the product.

What needs improvement?

The user-defined functions have shortcomings in AWS Data Pipeline. The user-defined functions could be one of the areas where I can write a custom function and embed it as a part of AWS Data Pipeline as a gadget and not like a Python code part of Data Pipeline, allowing the gadget to be reused across the business units.

In the future, I want AWS to inform me when I have gone beyond my limits of nodes and allow me to proceed with the tool's use if I talk to a representative and figure out a way.

For how long have I used the solution?

I have experience with AWS Data Pipeline.

What do I think about the stability of the solution?

It is a good stable solution since it doesn't choke when facilitating AWS Data Pipeline. In Glue, six to seven minutes were needed to bring up a cluster before one could do a job run. AWS Data Pipeline is not like Glue and is much better and more stable, along with the fact that bootstrap is also faster in Data Pipeline.

What do I think about the scalability of the solution?

In the tool, parallel processing is an area that is contingent, in the sense that you have to be watchful for the cap that you have in terms of computing behind AWS Data Pipeline. You need to always watch for some reason. I am capped with 200 nodes, and if I get to use more than 200 nodes, the AWS Data Pipeline will fail. AWS doesn't state that I have almost gone beyond my limits, and it is allowing me now to go beyond the set limits if I talk to a representative and figure it out. Such aforementioned warnings are not let out by AWS, and they end up failing the nodes if I go beyond the set cap limits.

How are customer service and support?

I rate the technical support a seven or eight out of ten.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I have used Azure Synapse, which has the same set of features as AWS Data Pipeline, though they have a different approach making it easy to use in comparison to AWS Data Pipeline.

How was the initial setup?

The initial setup of AWS Data Pipeline was easy.

On a scale of one to ten, where one is the most difficult and ten is the easiest, I rate the setup phase an eight.

What's my experience with pricing, setup cost, and licensing?

I rate the pricing between six to eight on a scale from one to ten, where one is low price, and ten is high price.

What other advice do I have?

Overall, I rate the product a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Geoffrey Leigh - PeerSpot reviewer
Chief Data Strategy and Governance Architect at INDEPENDENT PURCHASING COOPERATIVE, INC
Real User
Top 10
A stable, scalable, and reliable solution for moving and processing data
Pros and Cons
  • "It is a stable solution...It is a scalable solution."
  • "It's almost semi-automatic because you must review and approve code push, which works well. Still, we had many problems getting there during the deployment process, but we got there."

What is our primary use case?

We first receive lots of data in a flat file propagated or consolidated by one of our partners from all the operational data sources and Supply chain.

What is most valuable?

It's a service model, and we only pay for what we need rather than the server sitting in the corner that may or may not always need three environments.

What needs improvement?

We're only considering enhancing the presentation layer to give a more multidimensional OLAP view that AWS seems to have decided on. Redshift with the data mart structure is like an OLAP cube. Oracle Analytics Cloud is an over-code killer and is not what we need. I was looking at Mondrian, which used to be part of the open-source stack from another vendor that works. Still, I am also looking at some of the other OLAP environments like Kaiser and perhaps decided to go to Azure with Microsoft Azure analysis cloud, but that's not multidimensional either as SSAS used to be.

We tried the Mondrian, and that didn't perform how we expected. So, we are looking at resetting something to perform as an OLAP in the cloud, particularly AWS, so that we might consider an Azure solution.

For how long have I used the solution?

I have experience with AWS Data Pipeline.

What do I think about the stability of the solution?

It is a stable solution. We have had no problem with AWS because AWS Glue's service is fine and robust for its Data Pipeline. We've never had any problems with it, and we've been there over the last year or so.

What do I think about the scalability of the solution?

It is a scalable solution. I set it up as a total serial. For example, if each job took a minute, and I have a hundred jobs, it takes, or it used to take a hundred minutes to get them all done. We weren't considering parallelism previously, but now we've gone to parallelism. They execute all those hundred jobs in fifteen minutes or less.

The scalability and throughput are fine, and that's what we were looking for because the way that our company gets its data is anytime between midnight and ten AM for most of the data, so we need to sort of frequently update it so that we get closer to the ninety or four nines of the data so that the data is valid enough to make some corrections or estimation or plans.

A lot of people are using AWS Data Pipeline and put into the Redshift data warehouse and data ops. The user community of fifty to sixty members gets the reports directly from Excel connections to Redshift or through Tableau holes.

How are customer service and support?

Once when the device was having self-stopping issues, I contacted the technical support team. The more the team tried for a solution it said that the solution or the feature was available in the latest version. For instance, if you have Glue 3.0, you can do that. So one of the glue jobs hadn't been updated to 3.0 and hence the problem.

The technical support team is good. So when you find somebody and have a specific question, they are good, but when you have a general query, then you might not get what you expect. I rate the support a nine out of ten.

How would you rate customer service and support?

Positive

How was the initial setup?

The way we do it in our organization made it more complex because the infrastructure as a code using Terraform is good but rigid and expects a certain skill level to use it more elaborately. Still, we're managing to do that so we don't have just people going into a console creating services and controlling that. And then, with the actual Glue code, along with AWS, we will work out a way of making a full CI/CD DevOps approach to do check-in code, do pushes, and do for migrations graciously. It's almost semi-automatic because you must review and approve code push, which works well. Still, we had many problems getting there during the deployment process, but we got there.


    AWS Data Pipeline is manual. So there's a very lean team of only five members. And most have the experience of how it's been designed and built. That is how the maintenance happens.

    What's my experience with pricing, setup cost, and licensing?

    The way we use it, I think it is fair as we're getting a good value for money compared to having a server or some other data pipeline. Azure, I think, would work out more expensive.

    It is a good solution as I do not have to specify a large resource. It gives a good run and takes fifteen minutes to get to the pipeline.

    What other advice do I have?

    Since we took what Glue normally is to make it simpler, I rate the overall solution a nine out of ten.

    Disclosure: I am a real user, and this review is based on my own experience and opinions.