HERE ARE THE STORAGE REQUIREMENTS FOR DEEP LEARNING
Deep learning workloads are a special kind of beast: all DL data is considered hot data, which raises the dilemma of not being able to employ any sort of tiered storage management solution. This is because normal SSDs usually used for hot data under conventional conditions simply won’t move the data required for millions, billions, or even trillions of metadata transfers for an ML training model to classify an unknown something out of only a limited amount of examples.
Below are a few examples of a few storage requirements needed to avoid the dreaded curse of dimensionality.
COST EFFICIENCY
Enormous AI data sets become an even bigger burden if they don’t fall within the budget set aside for storage. Anyone who has been in charge of managing enterprise data for any amount of time knows well that highly-scalable systems have always been more high-priced on a capacity versus cost basis. The ultimate deep learning storage system must be both affordable and scalable to make sense.
PARALLEL ARCHITECTURE
In order to avoid those dreaded choke points that stunt a deep learning machine’s ability to learn, it’s essential for data sets t to have parallel-access architecture.
DATA LOCALITY
While it might be possible that many organizations may opt to keep some of their data on the cloud, most of it should remain on-site in a data center. There are at least three reasons for this: regulatory compliance, cost efficiency, and performance. For this reason, on-site storage must rival the cost of keeping it on the cloud.
HYBRID ARCHITECTURE
As touched on above, different types of data have unique performance requirements. Thus, storage solutions should offer the perfect mixture of storage technologies instead of an asymmetrical strategy that will eventually fail. It’s all about simultaneously meeting ML storage performance and scalability.
SOFTWARE-DEFINED STORAGE
Not all huge data sets are the same—especially in terms of DL and ML. While some of them can get by with the simplicity of pre-configured machines, others need hyper-scale data centers featuring purpose-built servers architectures that are previously set in place. This is what makes software-defined storage solutions the best option.
Our X-AI Accelerated is an any–scale DL and ML solution that offers unmatched versatility for any organization’s needs. X-AI Accelerated was engineered from the ground up and optimized for “ingest, training, data transformations, replication, metadata, and small data transfers.” Not only that but RAID Inc. offers all the aforementioned requirements such as all-flash NVMe X2-AI/X4-AI or the X5-AI, which are hybrid flash and hard drive storage platforms.
Both the NVMe X2-AI/X4-AI and the X5-AI support parallel access to flash and deeply expandable HDD storage as well. Furthermore, the X-AI Accelerated storage platform permits one to scale out from only a few TBs to tens of PBs.
We are looking into DataCore, Nexenta, and Tintri as prospective Software Defined Storage (SDS) solutions. We were previously set on DataCore, but have been hearing some interesting things about these other two competitor solutions!Can you help us us tease out which of these would be the best alternative for our business by sharing your experiences and opinions with our IT team of an excl...
There is no straight answer on this topic. Each storage architecture is different. Since the whole premise of SDS is to abstract the storage management form the hardware, Does the vendors software provide management for all your storage hardware brands and also all types of storage you use? Does it meet the end users needs and IT's? Assuming that you have a list of got to haves and nice to haves, all you can do is compare them. If Datacore meets your needs and you and IT have done your due diligence then go with that. There is always going to be someone with something "better".
Independent Analyst and Advisory Consultant at Server StorageIO - www.storageio.com
May 7, 2014
It Depends, tell us more info on your needs, requirements, why you are looking at just those three which represent three different approaches, or, did you look at three (or more) different approaches and these are the winners of those categories? How many virtual desktops will you be support, what is their workload/applications profiles and related info? If you are looking for turnkey converged with server, storage, basic connectivity, hardware and software all in one solution then of the three that would be Tintri. However if that is the route you are going, then what about Nutanix, Simplivity etc... Otoh if you are just looking for ZFS based software solution for san/nas etc to be deployed on different hardware, there is Nexenta as well as CloudByte, SoftNAS not to mention other variations including from those such as StarWind, OpenE etc... If you are looking for storage virtualization, there is Datacore which has been around for awhile, however there are others including the ZFS bases, StarWind, OpenE, EMC ViPR and many others...
So knowing what the problem is that needs to be solved will yield the answer of what is applicable or best for that given scenario.
Storage requirements are depending on several factors.
For example, the language you will use, the data you will receive, do you use a RAID, etc.
Hi @Evgeny Belenky ,
HERE ARE THE STORAGE REQUIREMENTS FOR DEEP LEARNING
Deep learning workloads are a special kind of beast: all DL data is considered hot data, which raises the dilemma of not being able to employ any sort of tiered storage management solution. This is because normal SSDs usually used for hot data under conventional conditions simply won’t move the data required for millions, billions, or even trillions of metadata transfers for an ML training model to classify an unknown something out of only a limited amount of examples.
Below are a few examples of a few storage requirements needed to avoid the dreaded curse of dimensionality.
COST EFFICIENCY
Enormous AI data sets become an even bigger burden if they don’t fall within the budget set aside for storage. Anyone who has been in charge of managing enterprise data for any amount of time knows well that highly-scalable systems have always been more high-priced on a capacity versus cost basis. The ultimate deep learning storage system must be both affordable and scalable to make sense.
PARALLEL ARCHITECTURE
In order to avoid those dreaded choke points that stunt a deep learning machine’s ability to learn, it’s essential for data sets t to have parallel-access architecture.
DATA LOCALITY
While it might be possible that many organizations may opt to keep some of their data on the cloud, most of it should remain on-site in a data center. There are at least three reasons for this: regulatory compliance, cost efficiency, and performance. For this reason, on-site storage must rival the cost of keeping it on the cloud.
HYBRID ARCHITECTURE
As touched on above, different types of data have unique performance requirements. Thus, storage solutions should offer the perfect mixture of storage technologies instead of an asymmetrical strategy that will eventually fail. It’s all about simultaneously meeting ML storage performance and scalability.
SOFTWARE-DEFINED STORAGE
Not all huge data sets are the same—especially in terms of DL and ML. While some of them can get by with the simplicity of pre-configured machines, others need hyper-scale data centers featuring purpose-built servers architectures that are previously set in place. This is what makes software-defined storage solutions the best option.
Our X-AI Accelerated is an any–scale DL and ML solution that offers unmatched versatility for any organization’s needs. X-AI Accelerated was engineered from the ground up and optimized for “ingest, training, data transformations, replication, metadata, and small data transfers.” Not only that but RAID Inc. offers all the aforementioned requirements such as all-flash NVMe X2-AI/X4-AI or the X5-AI, which are hybrid flash and hard drive storage platforms.
Both the NVMe X2-AI/X4-AI and the X5-AI support parallel access to flash and deeply expandable HDD storage as well. Furthermore, the X-AI Accelerated storage platform permits one to scale out from only a few TBs to tens of PBs.
@Ariful Mondal ,
Thanks for your response.