What is our primary use case?
The first time I used AWS Glue was in my first project where I automated large-scale ETL pipelines using AWS Glue and Python. I also used AWS Glue for data cataloging and transformation workflows.
Since then, I consistently use AWS Glue in other projects for ETL orchestration, especially in AWS-based environments, and it is one of my go-to tools for data integration.
What is most valuable?
AWS Glue has reduced efforts by 60%, which is the main benefit. Using AWS Glue allows for automatic scaling and reduces operational time.
Regarding data catalog integration, the Glue Data Catalog helps manage metadata across various services. Because I work heavily with PySpark, and AWS Glue supports it, it is very easy for me to build and debug pipelines.
For workflow orchestration, it is easy to define triggers, dependencies, and workflows within AWS Glue. Overall, AWS Glue is very efficient and integrates well with the AWS ecosystem.
What needs improvement?
AWS Glue is efficient, but a common issue I face is that when the Glue jobs run, especially compared to other orchestration tools, the job logs can be difficult to trace when troubleshooting complex PySpark transformations.
The Glue jobs mainly lack robust version controls for job and script integration directly into the console.
When comparing to tools such as Airflow, Glue workflows are still relatively basic in terms of flexibility and complex branching. AWS has improved over time, but it still needs improvement compared to other tools.
For how long have I used the solution?
I have more than five plus years of experience.
What was my experience with deployment of the solution?
For the initial setup with AWS Glue, I find it easy to set up the data catalog and create Glue jobs using the visual editor or the visual code. Setting permission sets via IAM rules can be a bit tricky at the start, but we ensure Glue has access to AWS S3, Redshift, and other services.
Once the role is configured, it runs smoothly. For advanced configurations, connecting to VPCs and setting up connections with JDBC sources takes more time compared to my cloud experience, but overall, for someone with cloud and ETL experience, the setup is manageable and well done.
How are customer service and support?
My experience with the support team from Amazon has been positive, as technical issues are usually addressed very quickly, and the support engineers provide detailed guidance and follow-ups.
For complex Glue-related problems such as job failures or permission issues, their documentation is good, but having direct access to support helps cut down troubleshooting time significantly.
In my experience, they have been proactive in suggesting best practices and optimizations when contacted. While the basic support types and response times can be slower, overall, I have had a positive experience with AWS support.
I would rate AWS support eight out of ten because they are technically strong and helpful in debugging complex Glue and cloud issues, and they are very responsive.
How would you rate customer service and support?
What's my experience with pricing, setup cost, and licensing?
Regarding AWS Glue's pricing, it is not more expensive; rather, it is very reasonable, but it is not cheap.
What other advice do I have?
I would recommend AWS Glue to other people.
AWS Glue is very user-friendly and allows for full management of ETL pipelines, which includes extract, transform, load, and helps prepare and combine large amounts of data for analytics, machine learning, and application development.
It automates heavy lifting ETL jobs, including schema discovery, code generation, and job scheduling. I recommend AWS Glue because it is serverless, requires no infrastructure management, is very scalable and flexible, and can handle small to large datasets.
It integrates easily with AWS services, making it simple to build data pipelines, and it supports both Python and Scala for ETL scripts.
Overall rating: 8 out of 10.
Which deployment model are you using for this solution?
Public Cloud
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)