2020-09-03T07:49:46Z

What needs improvement with AWS Glue?

Julia Miller - PeerSpot reviewer
  • 0
  • 6
PeerSpot user
Get the report
Helped 767,319 peers since 2012
28

28 Answers

Senthil Kumar Veerasamy - PeerSpot reviewer
Real User
Top 10
2024-01-18T08:38:03Z
Jan 18, 2024

Since AWS Glue is not like an enterprise ETL tool, we need to put quite a lot of effort into customization. The solution has a visual editor, but most ETL transformations cannot be implemented or constructed using that. We always have to do a script. The solution's visual ETL tool is of no use for actual implementation.

Search for a product comparison
ParamShah - PeerSpot reviewer
MSP
Top 10
2024-01-16T09:21:00Z
Jan 16, 2024

There are output limitations and configuration of its three parts. There was a lot of trial and error that we had to go through. It is not clear how the partition discovery would have been affected by more data coming in. We've made some expensive mistakes, which, if there were any tutorials available or if there was easy documentation available with FAQs, could have been avoided. There is documentation, but it doesn't cover all. There are three specific partition changes, and AWS Glue is tightly tied to Athena. We don't have much flexibility in managing the Athena. AWS Glue could integrate with an AI model or a more advanced version that processes chat-based inputs rather than configuration. This would align it more closely with the functionalities of chat-based interfaces, making it easier to adopt.

NM
Real User
Top 5
2023-10-09T14:32:26Z
Oct 9, 2023

I have encountered challenges with multi-region support.

Neelabh Sharma - PeerSpot reviewer
Real User
Top 10
2023-09-11T14:24:31Z
Sep 11, 2023

The product is expensive for data streaming compared to EMR. This area needs improvement.

Mbaye Babacar Gueye - PeerSpot reviewer
Real User
Top 5
2023-09-01T19:46:13Z
Sep 1, 2023

One area that could be improved is the ETL view. The drag-and-drop interface is not as user-friendly as some other ETL tools. Additionally, AWS Glue can sometimes be slow, especially when processing large datasets. It was sometimes a bit slow. Also, I couldn't directly use bucketed data. With Elastic Glue, you had to convert your data frames into the correct format before connecting them using the drag-and-drop interface. So that's something I didn't like because the conversion process wasn't straightforward. In future releases, I would like to see a feature that could trigger Glue pipeline using an API or something.

RajKumar23 - PeerSpot reviewer
Real User
Top 5
2023-08-03T09:08:10Z
Aug 3, 2023

The solution’s stability could be improved.

Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,319 professionals have used our research since 2012.
AmitMataghare - PeerSpot reviewer
Real User
Top 5
2023-08-03T04:25:26Z
Aug 3, 2023

AWS Glue Studio has undergone a lot of enhancements in the last couple of months. An improvement that can help the solution is if the user interface can become more user-friendly and allow for features like drag and drop, allowing it to build transformations. There can be a good improvement if the product itself supports different kinds of transformations so that the pipeline, which we want to create, can be done easily since right now, we have to write a code to do so in our company. Only people who can code, either in Java or Python, can use the product freely. Those who don't know Java or Python might find using AWS Glue difficult. AWS has pricing for spot instances that reduces the cost substantially, but that is not available for AWS Glue AWS pricing for spot instances comes for products like EC2, and if the same gets introduced for AWS Glue, then the pricing can substantially reduce.

Ajaykumar Myana - PeerSpot reviewer
Real User
Top 10
2023-07-31T17:41:50Z
Jul 31, 2023

In terms of performance, if they can further optimize the execution time for serverless jobs, it would be a welcome improvement. Faster code execution would be beneficial. If AWS could enhance the serverless execution capabilities, like increasing CPU, RAM, and processing speed, that would be great.

Shifa Shah - PeerSpot reviewer
Real User
Top 5
2023-05-24T12:30:04Z
May 24, 2023

While working on AWS Glue, I could not find any training material for it. Although it's not a problem with the product, the solution could include better documentation.

Vimalathithan M - PeerSpot reviewer
Real User
Top 20
2023-04-26T09:07:00Z
Apr 26, 2023

We face performance issues when using AWS Glue for data transformation and integration. It takes almost three to four hours to execute single transformations, which is a lot. We want to improve the performance to meet customer requirements. Mainly, I am focused on improving the performance aspect because the customer is keen on this improvement.

SP
Real User
Top 20
2023-04-20T10:59:00Z
Apr 20, 2023

The solution could be cheaper. The price of the solution is an area that needs improvement.

YB
Consultant
Top 20
2023-03-09T22:01:42Z
Mar 9, 2023

The product has only a few built-in transformations; additional custom-building transformations could be improved in the next release. For additional features, I would like documentation on the equivalent of legacy ETL tools and their equivalent in AWS to make it easier for users to migrate their ETL processing to the cloud. It would save time and help users find the best transformation or solution to satisfy their new business needs.

Syed Zakaulla - PeerSpot reviewer
Integrator
Top 5Leaderboard
2023-02-13T20:14:36Z
Feb 13, 2023

AWS Glue had some issues, which required optimization, particularly in terms of the number of workers you deploy, and that's where costing comes in. Cost-wise, AWS Glue is expensive, so that's an area for improvement. My company did some modifications, which turned out to be successful, so overall, the solution works fine. Even though there is a backup, you need to know what's happening. You need to understand why there's a failure. AWS Glue doesn't provide the information, so my company uses its logs. The development team also doesn't have specific answers because the team is still playing around with the process, which means the company is still trying to figure out other areas for improvement in AWS Glue. The process for setting up the solution was also complex, which is another area for improvement. AWS should provide help during migration and assist its users. Otherwise, it's a nightmare.

BV
Real User
Top 20
2023-01-19T18:04:06Z
Jan 19, 2023

I would like to see in general, documentation, on the limitations on which loads you can actually pull in when you are running Python. The additional Python Jupyter Notebook now has been nice. But yeah, generally speaking, you can not import every LOB. You can import branders now and you can use photos, but you can not import a lot of the other sorts of statistical-based loads. That is an issue currently. I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells.

Joaquin Marques - PeerSpot reviewer
Real User
Top 5Leaderboard
2022-11-25T20:48:52Z
Nov 25, 2022

The mapping area and the use of the data catalog from Glue could be better. I would say those two are the main things we'd like to see improvements on. The solution needs support for big data. As I understand it, Glue is based on Lambdas and Lambdas have some limitations as far as running them continuously. Sometimes they get dropped, and they have to be reinitialized.

Sainagaraju Vaduka - PeerSpot reviewer
Real User
Top 10
2022-10-28T15:16:30Z
Oct 28, 2022

I would like to see stable libraries at the moment they are not there.

Murilo Hallgren - PeerSpot reviewer
Real User
Top 10
2022-10-17T14:45:15Z
Oct 17, 2022

The price of the solution could improve.

Liana Iuhas - PeerSpot reviewer
Real User
Top 5
2022-09-01T11:06:20Z
Sep 1, 2022

The interface for AWS Glue could improve, they do not put a lot of details. You can write the code, in PySpark or in Scala, which is a big advantage, it is only easy to use for a developer. It will be difficult for new users to enter the cloud environment. If business users want to run their own graphs they will not have the opportunity to use such features, such as running code inside AWS Glue in Spark, which will be complex for them.

Ankit  Shukla - PeerSpot reviewer
Real User
Top 5
2022-07-20T15:04:13Z
Jul 20, 2022

The monitoring is not that good. We'd like to see job progress be more clear. Right now, how we can view that is not that good. The is that mostly it is Python or Scala code based. The UX is lacking. There is a bit of a learning curve, particularly during the setup process. More connectors should be included.

Sashi Dhar - PeerSpot reviewer
Real User
Top 20
2022-07-18T07:42:56Z
Jul 18, 2022

There should be more connectors for different databases.

Diksha  Hirole - PeerSpot reviewer
MSP
Top 10
2022-07-01T09:23:35Z
Jul 1, 2022

There are a couple of issues with AWS Glue. First, AWS Control randomly logs off, which disturbs coding. Second, if there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data. In the next release, AWS Glue should include more transformations with AWS Studio.

Suraj Sachdeva - PeerSpot reviewer
Real User
Top 10
2022-06-21T13:28:38Z
Jun 21, 2022

The technical support for this solution could be improved. In future, we would like to connect more services like Athena or Kinesis to help control more loads of data.

Jorge Encinas - PeerSpot reviewer
MSP
Top 20
2022-06-16T15:42:50Z
Jun 16, 2022

It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options.

DS
Real User
2021-12-02T16:14:50Z
Dec 2, 2021

There is a learning curve to this tool.

DB
Real User
2021-10-21T11:50:32Z
Oct 21, 2021

When there is a need to configure connections to different database sources in respect of the target, it would be good if it were easier to deal with roles. I am referring to the need to configure connections in a different target process, something which would require a certain time outlay for configuring VPC and checking that everything is okay, in respect of the creation of required roles. It would save time were this process to be made easier and more user friendly. The technical support depends on the type of question, whether there is a need to understand additional inter-related information on multiple levels. Overall, I consider the technical support to be fine, although the response time could be faster in certain cases.

BR
Real User
2020-12-17T18:52:47Z
Dec 17, 2020

The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS.

AS
Real User
2020-10-14T06:36:55Z
Oct 14, 2020

Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background.

CE
Real User
2020-09-03T07:49:46Z
Sep 3, 2020

The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3.

AWS Glue is a serverless cloud data integration tool that facilitates the discovery, preparation, movement, and integration of data from multiple sources for machine learning (ML), analytics, and application development. The solution includes additional productivity and data ops tooling for running jobs, implementing business workflows, and authoring. AWS Glue allows users to connect to more than 70 diverse data sources and manage data in a centralized data catalog. The solution facilitates...
Download AWS Glue ReportRead more