We primarily use the solution to run current jobs; to run the spark jobs as the current job.
The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient.
The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets.
The support could be improved a bit around the database. When we stream it to Data Lake, some data cannot be loaded. It should be a priority to fix this.
I've been using the solution for half a year.
The solution is scalable. However, it still needs us to manually set out the number of nodes in a cluster. It's really dependent on the application. Sometimes, when the tasks are bigger, and it gets a little difficult for us to define the number of nodes in a cluster. If the solution could allow users to set up the clusters, I think that'll be good.
Currently, we have three people using the solution. We may increase usage in the future.
The technical support is quite good. In the beginning, when we had a few POC projects, they were very supportive.
We didn't previously use a different solution, however, we built our own from scratch. This is the first unified platform that we've used.
The initial setup is very straightforward. We just use their job functions. To deploy as a spark job is quite straightforward.
In our use case, we also had some external databases to handle the deployment. For example, we only generated some prediction results. We saved the results into an external database. The solution takes time to deploy to the external database, but the spark job is quite easy.
I'm a software development engineer. I'm working with the latest version.
As long as the developers have an understanding of spark, and understanding technical tricks, it's very fast in terms of using the database.
I'd rate the solution eight out of ten.