What is our primary use case?
I have experience in
ClickHouse, and we also use Apache
Druid, which has corporate support from
Druid, along with data products in
Hadoop. We are currently exploring many platforms such as GMI, TKI, and Vertex.
I use
ClickHouse as a merchant side portal, especially when we started exploring how to use the data, which was coming from multiple sources such as logs, mainframe,
Teradata, and many file systems that come to the data lake. The real-time challenge was joining the data and providing more analytical queries for our merchants, who work throughout the year to improve GMB, sales, and ensure the right quantity of items is ordered at the right time. That's the challenge for the merchants, and we aim for fast analytical queries on larger databases, which is why we selected ClickHouse as our columnar OLAP database supporting real-time analytics with its own SQL interface.
We have installed both local
Docker versions, which are quite scalable, and usually connect with BI tools such as
Grafana,
Superset, and
Tableau while utilizing materialized views, DDLs partitions, and many other connectors with Python, such as ClickHouse connectors and drivers. It's exciting to see how ClickHouse has evolved, and we are evaluating ClickHouse Cloud while also having the on-premises version.
We are already a customer of ClickHouse, with Sam's Club utilizing it on the merchant side while also exploring ClickHouse for consumers, primarily for user analytics, metrics, and streaming data analysis in ad tech. Additionally, we use custom analysis and metrics for fraud detection in payments and ad campaign metrics, with various teams utilizing it for ad campaign management and user behavior analytics, particularly on e-commerce sites focusing on customer behavior. It's extensively used due to its low latency, fast aggregations, and excellent OLAP columnar storage, featuring quick joins and real-time data visibility, making ClickHouse very appealing to us.
What is most valuable?
ClickHouse is very easy to use; one of the good features is that it has joins, which were not present in Druid, and Druid was quite expensive, especially with our applications at Sam's Club utilizing ClickHouse very quickly.
ClickHouse deserves a rating of 9 when compared to competitors, particularly Druid, which is stable but comes with higher costs and subpar support. ClickHouse proves to be more lightweight, offering low latency and high throughput, along with joins, making it especially good for log and metrics handling.
What needs improvement?
The basic challenge for ClickHouse is the documentation, which isn't ideal, but it's mature and stable with more columnar storage, compression, and parallel processing, making it the best for OLAP. In terms of improvements, it's not designed for very frequent small writes, making it less scalable in write-intensive workloads, and it's not flourishing in transactional use cases or when ingesting streaming data, such as batching or buffering, which is something ClickHouse will improve.
What do I think about the stability of the solution?
ClickHouse is quite stable, and it deserves a rating of 9.
What do I think about the scalability of the solution?
ClickHouse deserves a scalability rating of 8 since it's quite scalable but has some room for improvement regarding scaling challenges.
How are customer service and support?
The support team has its own community support on platforms such as
Slack Overflow and ClickHouse
Slack. Commercially, the company provides enterprise support, especially for Sam's Club through ClickHouse Cloud. We utilize AVN ClickHouse, which is effectively managed by AVN, providing bug fixes and developing new functionalities along with architecture reviews. I appreciate their 24/7 support which is beneficial, although those using open source might face some challenges. Overall, the enterprise support is quite good.
How would you rate customer service and support?
How was the initial setup?
The initial setup for ClickHouse is relatively easier compared to
Flink; however, for newcomers, it is quite challenging. I find it easier in terms of API with single-node setups through Yum or apt taking only a couple of minutes to install. Planning cluster setups is a bit complex, primarily an admin task, and while a single-node setup is easy, managing ClickHouse Cloud is extremely easy. Creating clusters can vary from moderate to difficult based on the scale, typically from 5 to 10 nodes, depending on the use case.
What other advice do I have?
I would recommend this solution. Overall rating: 9 out of 10.
Which deployment model are you using for this solution?
On-premises
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.