What is our primary use case?
In the cloud, we have various databases like Cosmos DB, customer databases, blob storage, and so on. We don't want to handle everything as JSON files. If data needs to be in a structured format, Snowflake is a good option because it's easy to deploy.
Additionally, Snowflake takes care of the underlying storage layer, offers good performance, and has a very good distributed architecture through virtual warehouses.
Plus, there's the option for security, handling data duplicates, and data governance – it's all in one package. And it can seamlessly integrate with streaming data, which is awesome.
And it also integrates with Python and machine learning, similar to Databricks now. So, it's a good competitor for Databricks.
How has it helped my organization?
Snowflake has enhanced our data warehousing capabilities. Mainly, it's the infrastructure. We can distribute data and create a multi-warehouse architecture for performance optimization.
We don't need to worry about memory or storage, and Snowflake Cloud handles licensing and storage. The integration with Azure lets us deploy data seamlessly from our data lake to Snowflake storage using copy scripts with transformations.
Moreover, we primarily use the data sharing feature to store our logs, especially for machine learning use cases where structured data is essential.
Previously, we used Hive with MongoDB integration and a Hadoop layer. Now, we use Informatica to load unstructured data into Snowflake as a target. Then, we convert it into a structured format, making it usable for machine learning.
Snowflake's integrated features actually made the most significant impact on our analytics efficiency. Security, dynamic data masking, data cloning, data sharing, data sampling, ETL processes, and streaming processes – it's all in one product.
It's also a good product for batch processing. While Databricks may be superior for machine learning, Snowflake is still maturing in that area.
Moreover, Snowflake's performance speed was crucial to our project success when it comes to reporting capabilities. We use MicroStrategy as our centralized reporting front-end, which relies on Snowflake.
Also, when migrating data to a data warehouse, we can directly load it into Snowflake using Snowflake's integration with Azure Blob or AWS S3, enabling the infrastructure through Informatica or Azure Data Factory (ADF). This allows for a simple architecture with Snowflake functioning as a data lake, an intermediate transform version, or a source for sector data. And there's no complexity in handling JSON data.
Handling JSON files can be very complex, especially for hierarchical data, which needs to be split. Snowflake makes this very easy to handle.
What is most valuable?
Scaling is very high – there's no problem for scaling purposes. The learning curve is very small. And there are a lot of advanced features like handling duplicates, security, data governance, data sharing, and data cloning.
Data sharing and data sampling are very easy – data can be shared with a third party with just a simple command. We also have stored procedures which can be deployed in the cloud, eliminating the need for relational databases. Snowflake acts like a relational database, but without the old-fashioned complexities.
It can be a leader in the industry. If you compare Snowflake to something like Databricks, you'll need to know cloud infrastructure access, Python coding, and more – the learning curve is much steeper with Databricks.
With Snowflake, those core, familiar SQL skills allow you to handle data seamlessly. You don't need to worry about the infrastructure, security complexities, or the typical cloud-based architecture. This software can truly act as a data lake.
Instead of going for Blob storage, Azure SQL Database, or Databricks – especially compared to Databricks – Snowflake is very easy to learn and implement for all your analytics needs.
What needs improvement?
There are a few areass of improvement.
- Databricks may be superior for machine learning, Snowflake is still maturing in that area.
Also, machine learning in Snowflake isn't as advanced as in other products. I haven't heard of any successful industry-wide use cases of machine learning implemented in Snowflake. It might take a couple of years to reach the same level as Databricks.
- Python integration is still an upcoming capability within Snowflake.
- Additionally, loading data from the cloud to Snowflake is straightforward, but we still lack the capability to do the reverse – from Snowflake to the cloud.
For how long have I used the solution?
I have been using it for two years.
What do I think about the stability of the solution?
I would rate the stability an eight out of ten.
What do I think about the scalability of the solution?
I would rate the scalability a nine out of ten. We can instantly scale up. It is the main thing in Snowflake.
There are around 28 end users using it. Most of the job roles include developers and data engineers.
How are customer service and support?
The customer service and support are actually good.
How would you rate customer service and support?
What about the implementation team?
Since Snowflake is a newer tool, there's a vast amount to learn, and not as many study materials or videos are publicly available. We had to do a lot of self-exploration.
Additionally, being a vendor-based product, there can be challenges integrating Snowflake with third-party tools outside of the seamless Azure and AWS integrations. For example, integrating with SAP ERP or SAP HANA might not be feasible at this time.
Deploying Snowflake within the cloud is very simple. There's no infrastructure to manage, just security, access, and pricing provisions. However, third-party integrations might be a different story.
It's very easy, taking just a couple of hours – definitely easier than deploying something like an Oracle server.
What's my experience with pricing, setup cost, and licensing?
User accounts are separate from compute and storage – there's a decoupling in Snowflake. This means separate handling of storage and compute.
Overall, the pricing model is very competitive compared to products like Databricks, especially for streaming and business analytics use cases.
It is not overly expensive. I would rate the pricing a six out of ten, with ten being expensive.
What other advice do I have?
I would definitely recommend using it but with a caveat: it's an excellent tool if your organization is fully in the cloud.
For hybrid cloud environments, it's not as feasible.
Overall, I would rate the solution an eight out of ten. It lacks built-in front-end reporting capabilities. You'll need to integrate with a third-party tool like Power BI.
Also, Snowflake's graphical features are limited since it's primarily a database. However, if it offered built-in features for front-end reporting dashboards that support enterprise needs, it could compete with Power BI.
Which deployment model are you using for this solution?
Private Cloud