What is our primary use case?
We invested in NetApp technology to address our scalability and modernization requirements. Another platform we manage and aim to enhance with NetApp technologies is backups, currently residing on NetApp E Series. Each site has five to seven E Series systems handling backups.
Compared to NetApp AFF or C Series, E Series is extremely reliable but lacks features, offering basic connectivity and performance but limited capabilities. For backups, we're looking to modernize our approach and leverage features like SnapLock for rapid recovery, increasing our capacity for mass restores in case of ransomware or other incidents. We're exploring the C Series with QLC drives, seeking the optimal fit and cost-effective solution. We anticipate the C Series ASAs will likely be the answer, providing the right balance of features and efficiency for our backup needs in the coming months.
How has it helped my organization?
NetApp technology has significantly improved how we operate and serve our customers. We were attempting to migrate from several legacy appliances that functioned adequately, but planning our next steps was challenging. Many vendors we worked with took a scattered approach. We would ask for a replacement block solution with specific features and a NAS solution with other specific features, both performing exceptionally well.
One vendor offered nine different ways to achieve this, unable to clearly identify the best fit for us. This may not seem significant until we realize their largest competitor couldn't narrow it down to two options when we asked them to. It was almost a paralysis of choice within their own offerings. So, we switched to two or three other vendors who offered seemingly decent solutions, but it felt like a technological step backward. Their solutions were monolithic, difficult to scale, and challenging to work with in the future. My manager and I had prior NetApp experience at another organization, but our current location had no NetApp presence. It was a significant mindset shift for my colleagues and me.
NetApp allowed us to move from a monolithic approach, where an array served a single purpose and was impossible to grow without spending millions to replace it, to a more flexible model. We initially purchased the AFF 8200 and A300 lines, which have served us well for several years. We've been able to scale horizontally with new projects and products requiring hundreds of terabytes of storage. Instead of buying entirely new systems, we can simply add more gear to the existing cluster and expand it. This approach has benefited not just a few specific products but the entire organization. It has enabled us to scale horizontally within days instead of months spent onboarding a new vendor, installing, configuring, and learning a new appliance, only to find it doesn't meet our needs. NetApp has made us more agile in implementing new solutions and has reduced stress significantly. We've moved from constantly planning new purchases and integrations to simply deciding if we need more and, if so, adding it to the existing cluster. It's been an amazing shift from being reactive to proactive, empowering us to focus on other tasks. What used to be a full-time job for one person is now easily manageable for our three-person team.
The ever-changing cybersecurity landscape and the rise of AI have significantly impacted our technology infrastructure. We've isolated a few nodes from a former cluster, leaving them to function independently - they're like the forgotten stepchildren, enduring hardship without complaint. Thanks to extensive automation, we never have to log into them. These nodes support SnapLock, which provides peace of mind knowing the data is undeletable - even by admins or support staff. Only the passage of time, specifically the expiration of compliance timers, allows for deletion. Ironically, I designed a system so secure that it's now impeding my own migration efforts. An audit volume with a 90-day compliance timer remains on an old cluster, and I'm forced to wait for it to expire before proceeding. This underscores the effectiveness of the security measures, even if they cause minor inconveniences. Cybersecurity and autonomous ransomware protection have significantly improved management's peace of mind. Previously, fine-tuning access controls were necessary to prevent unauthorized actions. Now, additional safeguards protect end users from unintentional mistakes caused by external influences. These safeguards act like bumpers in a bowling alley, preventing significant harm unless we make a mistake and allow it. This reassurance lets us focus on our work without worrying about phishing attacks or other security breaches.
What is most valuable?
The NetApp technology that has delivered the most value to our organization is FabricPool. FabricPool was a game-changer for us because we previously had a large investment in Nearline SaaS drives on the PaaS 8200 platform. This platform was good at scaling horizontally and working as a hybrid storage model with flash fronting and flash pools for reads and writes, tearing down to disk afterward as needed. FabricPool on the A-series platform allowed us to consolidate six cabinets of gear into two.
With the NAS itself, we have our block array that cascades everything down to the FabricPool, which we've distributed throughout the data center. This means it doesn't have to live in the same dedicated cabinets or even the same row. Our data center operations team is happy because I told them they can put it anywhere as long as it has the necessary connectivity and power. This flexibility allows us to distribute storage more equitably and evenly. Everything gets prioritized on our flash tier. After 30 days of being cold which is common in a hospital setting, data gets deprioritized down to cheaper storage, going from dollars per gig to pennies per gig. The initial investment in power was significant, but where previously it was a premium for us to offer flash to our end-users, now I can offer it to everyone. FabricPool tears off data after users are done editing in their scratch space, and after 30 days, it moves to the cold tier, saving the organization money. We've gone from maybe 60 terabytes of flash to over 400, and half of it sits available for scalable storage. Previously, storage allocation was rigid, but now everyone gets flash unless their workload has absolutely no need for it.
This has made everyone happier because they no longer have to complain about slow home directories or other performance issues.
What needs improvement?
One of the challenges we face with NetApp is identifying bottlenecks in the systems they integrate with. For example, we had a persistent issue with buffer credits, where a card in a UCSFI was causing slowdowns due to its design limitations. This microphone analogy illustrates how a single component can hinder the system even with fast storage and computing. We are generating two billion buffer credit issues every 12 hours. Identifying and resolving this took nine months and involved replacing seven cards. While NetApp's Cloud Insights initially identified the issue, its high $150,000 per year cost made it unsustainable. This highlights the need for a more affordable solution to monitor the entire ecosystem, as relying on vendor analysis for isolated incidents is time-consuming. Ultimately, having comprehensive and cost-effective monitoring would save significant time and resources.
For how long have I used the solution?
I have been using NetApp products for almost 15 years.
Which other solutions did I evaluate?
When we first invested in NAS workloads around six or seven years ago, we were hiring a lot of people. We had considered pairing NAS and Block in one purchase and looked at HP and 3PAR who isn't a player anymore. We made a small investment in Nimble. Before Dell's acquisition, we were a large EMC shop, but they were monolithic, hard to scale, and slow. Extreme IO caused us panic when we realized we couldn't upgrade the small nodes, forcing complete replacements for capacity increases. It felt like a trap. My prior NetApp experience and my manager's knowledge contrasted this; with NetApp, we could add more until reaching system limits, not forcing a complete repurchase. It allowed organic growth and scaling. This has proven true, winning over skeptical colleagues who now agree it's a no-brainer.
What other advice do I have?
I would rate NetApp AFF a ten out of ten. It's a product that keeps us coming back, and I've been managing NetApp products on and off for almost 15 years. It's grown and matured significantly but has never lost its core fundamentals: ease of management, support, and flexibility within our data center. While other products might offer bleeding-edge speed, they can be challenging to manage or incredibly expensive. NetApp AFF remains manageable and relatively affordable. The first company I worked for chose NetApp because it was cost-effective and reliable and served them well. In my experience, NetApp AFF has always been a dependable and high-functioning solution.
We plan to expand our use of the solution in the future. We're currently considering NetApp to support our backup solution. We use Commvault for backup for VMware and most other systems not on the NAS. As a result, we're investigating new E series deployments for their stability and hardening them significantly or using NetApp C series ASAs and backing up Commvault with those. We expect these changes to result in greater resiliency, reliability, and scalability, especially with the C series. It's not just about backing up data; almost every backup vendor can do that well. The focus is on how quickly you can restore data in a crisis. We want to be confident that our backup suite won't be the bottleneck in a rapid recovery, even if we need to restore 600 virtual machines at once. It's good if the landing zone or VMware is the limiting factor. While it's a game of moving the bottleneck, previous backups were fast, but restoration took days. Now, being able to restore 100 or 200 virtual machines within an hour is a significant achievement, something unheard of five or ten years ago or prohibitively expensive. Our goal is to restore 1,000 virtual machines within a day, which is challenging but achievable when our environment has 1,500 virtual machines. Restoring three-quarters of them within 24 hours without relying on SnapRestore or similar native array capabilities is a major accomplishment. Being able to recover data from our last resort within a day is the ultimate goal.
For our next investment, we aim to increase security, scalability, and speed. While we typically prioritize two of these three factors, our current goal is to achieve all three.
Our immediate purchases will focus on cybersecurity and optimization. Within the next one to two years, we plan to develop an internal AI framework and transition to become a service provider for our clinics, customers, business units, and data analysts. We aim to offer a standardized framework to prevent them from independently procuring AI solutions that may be incompatible with our systems. By providing a supported framework, we can ensure compatibility and maintain our standards, avoiding the need to support potentially inadequate external solutions.