We performed a comparison between Apache Hadoop and Vertica based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."It's open-source, so it's very cost-effective."
"Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."
"The performance is pretty good."
"We selected Apache Hadoop because it is not dependent on third-party vendors."
"Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial."
"The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics."
"Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges."
"One valuable feature is that we can download data."
"Vertica is a great product because customers can compress and code data. The infrastructure that data warehouse solutions need is a commodity server so that customers don't have to invest in infrastructure."
"DBAs don’t need to add a partition every month/quarter like with other DBs."
"Vertica has a few features that I like. From an architecture standpoint, they have separated compute and storage. So you have low-cost object storage for primary storage and the ability to have several sub-clusters working off the same ObjectStore. So it provides workload isolation."
"Vertica is a columnar database where the query performance is extremely fast and it can be used for real-time integrations for API and other applications. The solution requires zero maintenance which is helpful."
"It maximizes cloud economics with Eon Mode by scaling cluster size to meet variable workload demands."
"The most valuable feature of Vertica is the ability to receive large aggregations at a very quick pace. The use case of subclusters is very good."
"Any novice user can tune vertical queries with minimal training (or no training at all)."
"For me, It's performance, scalability, low cost, and it's integrated into enterprise and big data environments."
"It needs better user interface (UI) functionalities."
"The solution is very expensive."
"The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support."
"The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop."
"The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment."
"Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them."
"I think more of the solution needs to be focused around the panel processing and retrieval of data."
"It would be good to have more advanced analytics tools."
"The documentation of Vertica is an area with shortcomings where improvements are required."
"It should provide a GUI interface for data management and tuning."
"Limitations in group by projections is where I would like to see an improvement."
"The integration with AI has room for improvement."
"It needs integration with multiple clouds."
"I would personally like to see extended developer tooling suited to Vertica – think published PowerDesigner SQL dialect support."
"Very bad support, I would rate it two out of 10."
"Fact-to-fact joins on multi-billion record tables perform poorly."
Apache Hadoop is ranked 5th in Data Warehouse with 31 reviews while Vertica is ranked 4th in Data Warehouse with 82 reviews. Apache Hadoop is rated 7.8, while Vertica is rated 8.4. The top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". On the other hand, the top reviewer of Vertica writes " A user-friendly tool that needs to improve its documentation part". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Azure Data Factory, Oracle Exadata, Snowflake and Oracle Big Data Appliance, whereas Vertica is most compared with Snowflake, SQL Server, Amazon Redshift, Teradata and SingleStore. See our Apache Hadoop vs. Vertica report.
See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.
SQreamDB is a GPU DB. It is not suitable for real-time oltp of course.
Cassandra is best suited for OLTP database use cases, when you need a scalable database (instead of SQL server, Postgres)
SQream is a GPU database suited for OLAP purposes. It's the best suite for a very large data warehouse, very large queries needed mass parallel activity since GPU is great in massive parallel workload.
Also, SQream is quite cheap since we need only one server with a GPU card, the best GPU card the better since we will have more CPU activity. It's only for a very big data warehouse, not for small ones.
Your best DB for 40+ TB is Apache Spark, Drill and the Hadoop stack, in the cloud.
Use the public cloud provider's elastic store (S3, Azure BLOB, google drive) and then stand up Apache Spark on a cluster sized to run your queries within 20 minutes. Based on my experience (Azure BLOB store, Databricks, PySpark) you may need around 500 32GB nodes for reading 40 TB of data.
Costs can be contained by running your own clusters but Databricks manage clusters for you.
I would recommend optimizing your 40TB data store into the Databricks delta format after an initial parse.
Morten, the most popular comparisons of SQream can be found here: https://www.itcentralstation.com/products/sqream-db-alternatives-and-competitors
The top ones include Cassandra, MemSQL, MongoDB, and Vertica.