There are several valuable features.
- Interactive data access (low latency)
- Batch ETL-style processing
- Schema-free data models
- Algorithms
There are several valuable features.
We have 1000x improvement in performance over other techniques. It's enabled interactive self-service access to data.
Better integration of BI tools wold be a much appreciated improvement.
I've used it for about 14 months.
I haven't had any issues with deployment.
It's been stable for us.
It's scaled without issue.
Customer service is excellent.
Technical Support:Technical support is excellent.
Yes, we previously used Oracle, from which we ported our data.
The initial setup was simple.
We implemented it with our in-house team.
Be sure to Uuse the Apache versions and avoid vendor-specific extensions.
It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.
We're able to perform data discovery on large datasets without too much difficulty.
It needs better documentation as well as examples for all the Spark libraries. That would be very helpful in maximizing its capabilities and results.
I've used it for over nine months now.
I haven't encountered any issues with deployment.
There have been no stability issues.
I haven't had any scalability issues. It scales better than Python and R.
I haven't had to use customer service.
Technical Support:I haven't had to use technical support.
I previously used Python and R, but neither of these scaled particularly well.
The initial setup was complex. It was not easy getting the correct version and dependencies set up.
I implemented it in-house on my own!
It's open-source, so ROI is inapplicable.
Learn Scala as this will greatly reduce the pain in starting off with Spark.
\Spark Streaming, Spark SQL and MLib in that order.
We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.
Like I said scalability is still an issue, also stability. Spark on Yarn still doesn't seem to have programming submission api, so have to rely on spark-submit script to run jobs on YARN. Scala vs Java API have performance differences which will require sometimes to code in Scala.
Have Scala developers at hand. Base Java competency will not be enough during optimization rounds.
I am using Apache Spark for the data transition from databases. We have customers who have one database as a data lake.
The most valuable feature of Apache Spark is its ease of use.
Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases.
I have been using Apache Spark for approximately 18 months.
Apache Spark is stable.
We are using Apache Spark across multiple nodes and it is scalable.
We have approximately five people using this solution.
The technical support from Apache Spark is very good.
I rate Apache Spark an eight out of ten.
