Associate Consultant at a tech vendor with 10,001+ employees
Real User
Top 20
An extremely user-friendly and stable tool requiring an easy initial setup
Pros and Cons
  • "The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users."
  • "The solution could be cheaper. The price of the solution is an area that needs improvement."

What is our primary use case?

Currently, we are utilizing AWS Glue for various ETL workloads, specifically in the life sciences domain. Our primary objective is to acquire data from various sources. Then, we store it in Redshift. This is where the complete use case of AWS Glue comes into the picture.

What is most valuable?

The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users.

What needs improvement?

The solution could be cheaper. The price of the solution is an area that needs improvement.

For how long have I used the solution?

I have been using AWS Glue in my organization for a year. I am an end-user and a customer of the solution.

Buyer's Guide
AWS Glue
April 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,924 professionals have used our research since 2012.

What do I think about the stability of the solution?

It is a stable solution. We have not faced any issues in the past year, so it's pretty stable. Stability-wise, I rate it a ten out of ten.

What do I think about the scalability of the solution?

The solution has proven to be scalable, and from my experience in the data engineering domain, I rate it an eight out of ten. It is worth noting that I may not be the most qualified person to provide a rating since I mostly manage and work on data-related tasks. Currently, approximately 20-25 people in our company use the solution.

How are customer service and support?

I had no experience with the technical support team of AWS Glue.

Which solution did I use previously and why did I switch?

Previously, I used Azure Data Factory. But I did not find it really helpful. And it was a bit complex. It was not that user-friendly. And I am much more comfortable with the AWS services as compared to Azure services.

How was the initial setup?

The initial setup of the solution is straightforward, and I find it easy to implement. I rate the setup process a nine on a scale of one to ten, where ten is the easiest. As for the deployment process, we usually request our platform team to handle it, and they are quite efficient in deploying and managing the infrastructure. Although I am not directly involved in the deployment process, my understanding is that it can be completed in just a few hours with the help of two to three team members. Our platform team consists of data engineers, architects, and platform engineers who cater to the needs of various projects and products within the AWS ecosystem. Fortunately, the solution does not require any maintenance.

What's my experience with pricing, setup cost, and licensing?

Price-wise, the solution is adequate, and we have no issues with it. We believe that the cost is justified given the number of users and the features it provides. Overall, it can be considered an average-priced tool. I would rate the solution a six or seven on a scale of one to ten, with ten being very expensive. Specifically, I rate its pricing a six out of ten.

Which other solutions did I evaluate?

Before choosing AWS Glue, I evaluated Azure Data Factory.

What other advice do I have?

I would tell those planning to use AWS Glue to try it. I rate the overall solution a ten out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sainagaraju Vaduka - PeerSpot reviewer
Data solution architect at a pharma/biotech company with 5,001-10,000 employees
Real User
Top 10
Excellent scalability, with valuable features, and profitable return on investment
Pros and Cons
  • "The most valuable features currently are glue studio, jobs, and triggers."
  • "I would like to see stable libraries at the moment they are not there."

What is our primary use case?

We are primarily using it for batch crossing and transformations.

How has it helped my organization?

We have a large set of data and we are doing some transformations and identification. We are cleaning the data and transformations. Then we are putting the data into the destination table. So it is very comfortable.

What is most valuable?

The most valuable features currently are glue studio, jobs, and triggers.

What needs improvement?

I would like to see stable libraries at the moment they are not there.

For how long have I used the solution?

I have been using AWS Glue for the past five years.

What do I think about the stability of the solution?

The stability I would consider to be an extensible Apache Spark.

What do I think about the scalability of the solution?

The scalability is good and we have three hundred projects we are working with.

Which solution did I use previously and why did I switch?

Previously, we used EMR, Informatica, Data Pipeline, and Azure Data Factory.

How was the initial setup?

The initial setup is straightforward.

What about the implementation team?

We did our deployment in-house with the CI/CD integrations like GitHub and deployed the code on Glue. 

What was our ROI?

We are seeing a very good return on our investment.

What's my experience with pricing, setup cost, and licensing?

The current cost is around forty to fifty thousand a month.

What other advice do I have?

I would definitely recommend using AWS Glue for batching procedures. I would rate AWS Glue an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
AWS Glue
April 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,924 professionals have used our research since 2012.
Consultant Data junior at a computer software company with 51-200 employees
Consultant
Top 20
User-friendly visual interface, but only a few built-in transformations
Pros and Cons
  • "The most valuable feature for me is the visual interface of AWS Glue."
  • "The product has only a few built-in transformations."

What is our primary use case?

The primary use cases of AWS Glue in our organization are for implementing ETL processes and for data flow.

What is most valuable?

The most valuable feature for me is the visual interface of AWS Glue. It is user-friendly and it is not complicated. Moreover, the coding part of AWS Glue allows users to upload their scripts after dropping some components. The product has flexibility and scalability, which is common in most cloud tools.

What needs improvement?

The product has only a few built-in transformations; additional custom-building transformations could be improved in the next release.

For additional features, I would like documentation on the equivalent of legacy ETL tools and their equivalent in AWS to make it easier for users to migrate their ETL processing to the cloud. It would save time and help users find the best transformation or solution to satisfy their new business needs.

For how long have I used the solution?

I have been using this solution for three months, and I am using the latest version.

What do I think about the stability of the solution?

The stability is good; I have not faced any crashes so far.

What do I think about the scalability of the solution?

I would rate its scalability a seven out of ten.

Which solution did I use previously and why did I switch?

I used a product called SysTrack. For me, it was just a switch from SysTrack to AWS Glue.

What's my experience with pricing, setup cost, and licensing?

The pricing depends on the usage, such as the number of users, computers, and the time jobs run.

What other advice do I have?

Overall, I would rate this product a seven out of ten. It is a good product, but I have not experienced all the additional features.

Which deployment model are you using for this solution?

Private Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Manager at a construction company with 51-200 employees
Real User
Top 20
Excellent capabilities, proven stability, however would like a more robust interface on the no-code side
Pros and Cons
  • "We have found it beneficial when moving data from one source to another."
  • "I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells."

What is our primary use case?

Our primary use case is ETL.

How has it helped my organization?

We have found it beneficial when moving data from one source to another.

What is most valuable?

The most valuable feature In terms of convenience, the drag-and-drop is really nice. The no-code interface, is really nice, being able to drag in my connectors. And then the nice thing, as well, is that it generates the framework, the wireframe of your code, so then you can just input whatever Spark or Python you want to input to make any further transformations.

What needs improvement?

I would like to see in general, documentation, on the limitations on which loads you can actually pull in when you are running Python. The additional Python Jupyter Notebook now has been nice. But yeah, generally speaking, you can not import every LOB. You can import branders now and you can use photos, but you can not import a lot of the other sorts of statistical-based loads. That is an issue currently. I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells.

For how long have I used the solution?

I have been using AWS Glue for the past three years.

What do I think about the stability of the solution?

The stability is excellent.

What do I think about the scalability of the solution?

There is good scalability you can set up your minimum and maximum users and you are ready to implement.

How was the initial setup?

The initial setup is straightforward If you are just doing a file format conversion, then it is very simple, but if you want to do a little bit more robust sort of transformations, like inserting transformations or you want to do transformations on multiple delimiters, then there is a bit of learning curve. The deployment time is literally minutes.

What other advice do I have?

I would rate AWS Glue a seven on a scale of one to ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Cloud Data Engineer at jems groupe
Real User
Great for serverless data transformations but more resources are needed for running Spark jobs
Pros and Cons
  • "The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs."
  • "The solution should offer features for streaming data in addition to batching data."

What is our primary use case?

Our company is creating data warehousing in the cloud. Our team includes four data engineers, two data ops, and two data administrators. 

We use S3 to data lake or prepare data from two databases that are contained in MySQL and Oracle. For the migration, we use DMS.

Then, we use the solution to perform data transformation. For Oracle, we use Data Catalog and Data Crawler to create our catalog. Dev Endpoint is used to develop complex data transformations. We then migrate to Studio Notebook where we develop and schedule a complex Spark job. 

Finally, we load the transformed data to Redshift so our data analyst team can visualize it with QuickSight. 

What is most valuable?

The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs. 

The solution works with many data sources and services in the cloud. 

Glue Watch monitors our Spark jobs and immediately alerts us to issues so we are able to resolve them quickly. 

What needs improvement?

The solution does not work with Spark DataFrame. We can use the solution's DynamicFrame for this function but transformations are expensive. 

Not enough resources or services are available to run managed Spark jobs within the solution. We have reached out to Amazon many times regarding this issue. 

The solution should offer features for streaming data in addition to batching data. We can use other products such as Scala or Python but prefer the features be available in the solution. 

For how long have I used the solution?

I have been using the solution for one year. 

What do I think about the stability of the solution?

The solution is stable with no issues. 

What do I think about the scalability of the solution?

The solution is scalable. 

How are customer service and support?

Technical support has been good and has handled any issues. 

I rate technical support an eight out of ten. 

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

The solution is the best service in its category at this time. Based on project budget and use case, we use either the solution or EMR.

EMR is used for projects that require the latest version of Spark. 

We use the solution for any other versions of Spark. 

How was the initial setup?

I was not involved in the initial setup.

What's my experience with pricing, setup cost, and licensing?

The solution's pricing is based on DPUs so it is a good idea to optimize use or it can get expensive. 

I use Studio Notebook because it is less expensive and jobs can be deleted or clustered to run in one day. 

I rate pricing a four out of ten. 

Which other solutions did I evaluate?

Our company only uses Amazon cloud because other cloud environments do not offer the same features. 

The solution's Studio uses GCP which is easier than coding in Python Spark or Scala Spark. 

Azure Data Factory's features do not compare to what the solution can do in the cloud. 

What other advice do I have?

The solution is good for teams who do not want to worry about DevOps or who want to optimize cost by using the cloud. 

I rate the solution a seven out of ten. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Data Engineer at a computer software company with 501-1,000 employees
Real User
It efficiently collects and catalogs the data but needs to improve performance
Pros and Cons
  • "It is a stable and scalable solution."
  • "It fails to handle massive databases acquired from various sources."

What is our primary use case?

We use the solution to collect customers' data containing multiple files and convert it into a common database. Later, we send the database for SQL injection.

What is most valuable?

The solution's most valuable feature is its ability to efficiently collect and catalog the data in the warehouse.

What needs improvement?

They should improve the solution's performance in case of large amounts of data. Currently, AWS fails to handle massive databases acquired from various sources. Also, it is challenging to queue the data or use a standard code in AWS environment. We need to install a third-party tool to tackle the issue. We need to use another tool to convert the data as well. Thus, we are using multiple tools to handle the database. They should work on this particular area.

For how long have I used the solution?

We have been using the solution for one year.

What do I think about the stability of the solution?

It is a stable solution. I rate its stability as an eight.

What do I think about the scalability of the solution?

I rate the solution's scalability as a six.

How was the initial setup?

The initial setup is a bit complex, and I rate the process as a six. We have to install multiple third-party tools whenever we update the security patches or renew the solution. Thus, the deployment process is complicated.

What other advice do I have?

If you already have AWS environment, you can opt for AWS Glue for its ETL operations feature; if you want to process multiple operations, such as creating a table or catalog, or for machine learning purposes better to go for other database tools.

I rate the solution as a seven.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer:
PeerSpot user
Sashi Dhar - PeerSpot reviewer
Operations executive at Wipro Infotech
Real User
Top 20
Good support, user-friendly, and AWS-integrated
Pros and Cons
  • "It is AWS-integrated. There is end-to-end integration with the other AWS services. It is also user-friendly."
  • "There should be more connectors for different databases."

What is our primary use case?

We are using it for day-to-day ETL jobs. It is being used to transfer data from Teradata to the cloud.

We are using its latest version.

What is most valuable?

It is AWS-integrated. There is end-to-end integration with the other AWS services. It is also user-friendly.

What needs improvement?

There should be more connectors for different databases.

For how long have I used the solution?

I have been using this solution for almost a year.

What do I think about the stability of the solution?

It is stable.

What do I think about the scalability of the solution?

It is scalable. We have almost 40 users.

How are customer service and support?

Their support is very good. I would rate them a five out of five.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We were not using any other solution previously.

How was the initial setup?

It was straightforward. Within a couple of hours, it was done.

What other advice do I have?

Before you start using it, you need to know PySpark.

I would rate it a nine out of ten. It is good for what we are using it for.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
CEO and Founder at HartB
Real User
Improved our time to implement a new ETL process and has a good price and scalability, but only works with AWS
Pros and Cons
  • "The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features."
  • "The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."

What is our primary use case?

It is a good tool for us. All the implementation in our company is done with AWS Glue. We use it to execute all the ETL processes. We have collected more or less five terabytes of information from the internet by now. We process all this data in our cloud platform and normalize the information. We first put it on a data lake that we have here on the AWS tool. After that, we use AWS Glue to transform all the information collected around the internet and put the normalized information into a data warehouse.

How has it helped my organization?

It has improved the time to implement a new ETL process by 30%. We have also seen a big improvement in the data science area.

What is most valuable?

The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features.

What needs improvement?

The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS.

For how long have I used the solution?

I have been using this solution for two years.

What do I think about the stability of the solution?

In terms of stability, we had some problems in the past, but now, it is okay. AWS provides SLA, and the integration of the tools is good.

What do I think about the scalability of the solution?

Scalability is a very strong point of this solution as compared to other solutions like PowerCenter and Pentaho. In Pentaho, you need to install a lot of machines, but in AWS Glue, you just need to find out how many instances do you need. You just put this information in a form and click okay. Magically, you have the scaled processes. 

We have 35 users of this solution, and they are engineers, DevOps, and data scientists. We have a lot of plans to increase the usage of AWS Glue in 2021.

How are customer service and technical support?

In the first year of using it, we had a lot of problems with the solution. Our team found more or less five bugs if I remember correctly. Our experience with AWS support was very good. The team in the US helped us to resolve the problems and fix the bugs. We are AWS partners.

Which solution did I use previously and why did I switch?

Before AWS Glue, we worked with Talend, PowerCenter, and Pentaho. In the case of PowerCenter, the biggest problem for us was the plugins because they were too expensive. That was the negative point of PowerCenter. 

In the case of Talend, the problem was that in Brazil, we didn't have professionals with the skills to work with Talend. In addition, we had to use the command-line interface, which was a terrible thing because it took more time as compared to other solutions.

In the case of Pentaho, we had the same problem as Talend. We didn't have a lot of professionals. Of course, we have some courses to train people in Pentaho. We work with the biggest companies in Brazil, and we need professionals every day, but we don't have professionals with experience in Pentaho.

How was the initial setup?

The initial setup process is totally easy. You just need to put some information in the forms, and then you just need to click some buttons, and it is complete. The process to provide a new infrastructure with AWS Glue takes from 10 minutes to an hour.

What about the implementation team?

We have all the professionals inside the company.

What's my experience with pricing, setup cost, and licensing?

Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is also good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients.

In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend.

What other advice do I have?

I would rate AWS Glue a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros sharing their opinions.
Updated: April 2024
Product Categories
Cloud Data Integration
Buyer's Guide
Download our free AWS Glue Report and get advice and tips from experienced pros sharing their opinions.