Spark SQL has been in our stack for less than one year, though some of our colleagues are using it. It is a useful product for transformation jobs.
We generally use Spark SQL for batch processing. We use it for general batch operations and have used it for some streaming jobs.
We did not spend much additional effort on Spark jobs.
It is a useful product for transformation jobs.
Speed is the major benefit of using Spark SQL.
Spark SQL is interoperable with Hive. While migrating from HDFS to Iceberg, we did not need to change our Spark SQL job configurations, only the location or type of connection from Hive to Trino or S3. This interoperability proved to be very useful.
We do not have any performance problems, but we do have some resource problems. Spark SQL consumes so many resources that we migrated our streaming job from Spark to Apache Flink.
Resource management in Spark SQL should be better. It consumes more resources, which is normal. The main reason we switched from Spark is memory and CPU consumption. The major reason is the resource problem because the number of streaming jobs has been increasing in our company. That is why we considered resource management as a priority.
Because of the resource consumption, I would say the development of Spark SQL is better. For development purposes, it is a top product and not difficult to work with, but resources are the major problem. We changed to Flink regardless of development time. Development time is less in Spark compared with Flink.
We first migrated from Spark to NiFi for some streaming tasks. It is a good alternative for us, but we changed our decision from NiFi to Flink because NiFi had some problems while ingesting data to Iceberg or S3.
We are not looking for any other options. We generally use open-source products and are supporters of the open-source movement. Our tendency is to use open-source products, so Apache products are at the top of our list.
We are not paying any money for Apache products. We always use the free versions, the open-source versions.
We are currently dealing with Iceberg, Flink, Airflow, and Trino.
Regarding the Catalyst query optimizer, I think we are using it. We were using it in the past, but I am not certain if we use it now. We used it a long time ago.
I rate my experience with Spark SQL as an eight out of ten. I have been working in this field for eight years.