it_user587637 - PeerSpot reviewer
Infrastructure Manager with 1,001-5,000 employees
  • 25
  • 224

IBM vs. EMC vs. Hitachi Compression

The IBM FlashSystem V9000 compression is a lot higher than EMC XtremIO and Hitachi. 

IBM claims to be around 4.5:1, whilst EMC and Hitachi guarantee around 2:1 for Oracle workloads on AIX. 

The IBM guarantee is for 90 days from implementation whilst the EMC guarantee is there for the duration of the maintenance.

How true is this IBM compression figure for general Oracle on AIX?

PeerSpot user
10 Answers
it_user127386 - PeerSpot reviewer
User at Hitachi Data Systems
Real User
Jan 26, 2017

To IT Central Station and prospective all-flash (AFA) buyers:

Thank you for reaching out on this topic, these vendor’s claims are confusing to many non-initiated buyers. On face value, it appears that some technologies could perform much better based on their messaging. I posted a blog on this topic here: https://community.hds.com/community/products-and-solutions/storage-systems/blog/2016/08/01/many-all-flash-array-data-reduction-guarantees-are-sounding-like-us-republican-presidential-nomination-politics

Note that the basis for vendor’s data reduction claim do vary greatly as some vendors choose to include benefits from thin provisioning and snapshots in their factoids (aka: alternative facts). Keep in mind that any “up to” reduction value message is just that; a value achieved in lab or unique workload and not representative of average.

In this case, both EMC and Hitachi choose to represent average compression results from prior deployment; no deduplication value is included. IBM provided best case as this V9000 model does not support deduplication; consequently, it creates the impression that this V9000 system delivers superior result with deduplication supported!

The variation in data reduction (compression and/or deduplication) result is mainly a function of the data set, not the vendor’s technology as engineers are limited to similar latency overhead. Here is a sample of typical Hitachi Storage Virtualization Operating System (SVOS) compression and deduplication results by workload that Gartner validated:


Hitachi benchmarked internally both IBM and EMC AFAs on eight different workloads and data reduction results came in within 5% of each other’s. Data reduction services performance will also be a function of the data chunk size; as an example, a 4KB chunk size engine can achieve about 5-10% better results versus an 8KB engine. Note that this extra savings will come at the performance and memory cost.

So how do you proceed as a prospective buyer to assess the value of each technology? I recommend using vendor’s data reduction estimator tool on your own data set and make your own conclusion. Hitachi estimator tool can be downloaded here: https://hcpanywhere.hds.com/u/9xzMSp4czIPlHMxK/hidr_estimator-02_1_0.zip?l ; data reduction estimators are built-in with the similar algorithm as their AFA counterpart.

As for performance impact, I would recommend reading this blog: https://community.hds.com/community/products-and-solutions/storage-systems/blog/2016/11/03/flashbuyer . In the real world, you can’t have your cake and eat too…

Full disclosure, I do work for Hitachi Data systems and support Hitachi enterprise flash business.

I hope this information helps.

Patrick Allaire

Search for a product comparison in All-Flash Storage
it_user587616 - PeerSpot reviewer
Storage Team Manager and Capacity Manager with 1,001-5,000 employees
Jan 30, 2017

IBM V9000 it is just bundled SVC + V900 flash. We use SVC Real Time Compression for several years now mainly with Oracle databases on AIX and Solaris. We do not use V9000 because SVC is more stable and flexible solution.

We have purchased HDS G600 lately so I am going to have a direct comparison between SVC RTC vs G600 within two months.

You may find this helpful:

IBM uses improved compression algorithm proposed by Lempel and Ziv LZ78 while HDS uses derivative of the LZ77. Since the compression algorithm is very similar in both cases you should expect pretty similar compression ratio for the same kind of data.

IBM Real Time Compression is very handy and we use it widely for warehouse databases with ratios around 3:1. It is very flexible and reliable solution but I would never put OLTP systems on it.

The main reason is that IBM RTC adds quite significant delay (1-2ms at last depending on actual load and the data itself) and it is a performance bottleneck for IBM nodes. In case of SVC with DH8 nodes the difference was 50k IO/s with RTC vs 250k without but since IBM uses dedicated hardware for compression the RTC traffic do not influence the rest of the traffic so you can have 50k RTC and 200k non-RTC traffic simultaneously. V9000 surely will act similar check it during the POC.

IBM RTC is "per lun" approach and has a limitation of max 512 volumes per SVC iogroup. You should check how it is in case of V9000.

HDS G-series has compression on FMD DC2 modules only while IBM compression is "disk independent"

HDS compression is "global" and you cannot disable it.

HDS compression is always on so is the compression influence on G-series performance. The good news is that HDS FMD DC2 compression is built in flash modules and unlike IBM RTC introduces very small delay. Hitachi G-series claim to be able perform with hundreds of thousands of IO/s with 0,2-0,5ms response time. That is what we are going to test the following two months.

Good luck with your POC.

it_user238902 - PeerSpot reviewer
Infrastructure Architect at a retailer with 1,001-5,000 employees
Real User
Jan 26, 2017

I held off on responding to this question initially, but wanted weigh-in seeing some comments that have come up.

We do not have an Oracle Workload so I cannot compare apples to apples, we are a pure SQL shop for our ERP and other database workloads. So, I will refrain from posting a reply to this specific question on Oracle, but respond more to the V9000 aspect you have stated.

Before I continue, I like to approach the question with the true end goal if you are able to discuss it. What is more important, data reduction, latency, burst IO, or a balance of all? There was no mention of performance in your question, but comparing a V9000 to an XTremeIO makes me ask how you are comparing to different purposed arrays. The V9000 is an all-purpose AFA with limited data reduction features, the XTremeIO is a niche product designed as an extremely low latency product, but has many limitations outside of smaller datasets which do not dedupe well.

We were in the middle of a SAN Refresh and going through the motions of what was the right fit for our specific workload on an AFA; IBM approached us initially with a V7000 as their recommendation, EMC with the XTremeIO. After the initial environmental questions IBM went on their merry way, they were then promptly back pitching us a V9000 after speaking to some of their more seasoned engineers (They claimed).

IBM’s claim on the V9000 compression is quite boastful, but unrealistic if you have a disparate dataset, many have already spoken on this. The V9000 was pitched to us to replace an aging EMC VNX. EMC pitched the XTremeIO, NEITHER of them was the chosen successor because of the issues with scalability (XTremeIO) or performance based off poor sizing recommendations from their SE’s (V9000).

The V9000 is heavily reliant on very appropriate spec’ing and there is no true way of answering the generic question below without understanding your dataset. You have to get a signed off guarantee after they run the “Comprestimator” against your workload (which you mentioned they did). We also checked with them for SQL for our ERP and our VM workload as well. And as mentioned, they cover you only for the first 90 days, then you are on your own. They promise sight unseen of the same 2:1 as with competitors mentioned in other posts. But here are the issues that came up long before they even could offer a PoC in our datacenter.
• They miscalculated our entire workload, multiple times
o They asked for forgiveness and promised to get it straight, they still got it wrong and they never left the runway.
o They had the appropriate data reduction, but not performance, no dedupe and only compression on a V9000 promising they would have it soon, but at the cost of waiting for the next gen to mitigate the latency impact to our workload, it showed an expectation of 2.5 – 3 ms service times for our ETL job and some other tasks that run, while the sales team kept promising <1ms. This would have really poorly impacted our ERP transacting users and was only caught by my company’s due diligence, they were hoping it would work for 90 days and then be out of the guarantee. This was the immediate deal breaker, BUT THEY GUARANTEED their data efficiency, be cautious for your enterprise’s sake.
o Reliability has come up as a contentious point in many different discussions with my peers in the industry locally, there have been a few recorded large outages and now there is an issue of brand damage locally.

Otherwise, their generalizations are just that. EMC and Hitachi play the other game of assume the worst, and anything better is gravy, but seem to keep the aspect of performance in mind.

As to the EMC (Now Dell) XTremeIO, be VERY wary of that solution.
• For datasets under the size of an EMC Half or Full Brick, they will give amazing low latency, but nothing else if your data is not working with the XTremeIO’s requirements.
o Upgrades are super expensive and come in half-brick and full brick sizes.
• EMC has since been pushing alternates since the Dell acquisition and we even had people pushing the Unity AFA since it’s more scalable and a Dell rebranded VNX (if you take your time to look under the hood); you cannot scale to larger drives with the XTremeIO and EMC has been exceptionally tight lipped on this.
• Scale out is your only option, no scale up as a result
• Attempts to change the architecture to newer NAND capacities to address scalability have issues.
o Some data destructive code upgrades and they still cannot change the XtremeIO to larger capacity drives last time I checked
• Take into account 8 gig fibre channel and no TCL larger flash drive support due to the above.

ROI and TCO:
• This is a very pointed niche device for small, specific datasets and will offer extremely low latency and high performance at the cost of zero scale-up and extremely expensive scale-out.
• TCO is extremely high in cost per TB as a result and even the ROI is quite hard to justify unless you have the appropriate dataset to fit onto it needing that latency.
I have no experience with Toshiba to speak on their behalf and am neutral on most items regarding this.

So, without understanding the more intimate use of your work set, which is more important, data efficiency, overall capacity, or performance, this is the best I can answer. But I can tell you for a less burdening SQL solution compared to most Oracle loads, we still chose a different vendor; neither IBM or EMC based off the information gathered in the industry and candid talks with their SEs during our selection process.

If you aren’t as worried about Total Cost of Ownership or Return on Investment, or just need killer speed, there are many other solutions out there for a cost. My enterprise is exceptionally TCO focused and while latency is quite sensitive to our end users, cost per TB is extremely important, so I understand where the data efficiency question comes in. Without dedupe on a V9000 it really destroyed the cost as it required many more TB of physical disk compared to an XTremeIO for us, but our entire load didn’t fit into a full brick and the scale out made it cost prohibitive and completely unjustified for what we would receive.

I hope my two cents helped, if you have any further questions to my specific experience including our choice solution, do not hesitate to ask.

I do not work for any storage vendor and have nothing to gain by giving honest experiences of any array.

Jason Melmoth
Infrastructure Architect

it_user549372 - PeerSpot reviewer
User at a tech company with 51-200 employees
Jan 26, 2017

Worst storage I have ever tested in 15 years.
Very slow with compression enabled (300.000 IOPS without compression, 150.000 IOPS with compression).
Faster with write cache disabled (!!!???)
We had also a crash and a downtime of 8 hours!
Be careful with v9000.... Do not buy it without a POC.

it_user587637 - PeerSpot reviewer
Infrastructure Manager with 1,001-5,000 employees
Jan 26, 2017

We have run the comprestimator tool and this is how IBM have come forward with the guarantee. The POC is in the process of being setup up.

This is the first time I have posted on this site and I am truly grateful to all of the feedback. If there is any more wrt V9000, please feel free to share all of your experiences, likes, dislikes gotchas etc.

Thank you.

it_user450228 - PeerSpot reviewer
Project Manager / Lead Architect with 51-200 employees
Jan 26, 2017

I agree with Martin that every database is going to give you a different result so I would recommend using the comprestimator tool to check what you can expect. Just done a sizing exercise for a customer using this tool and it is very accurate.Not a AIX environment though but 3 x SQL databases where we are getting 4:1.Only ever seen 4.5 : 1 on a PURE m20 array with Oracle on Windows.


Learn what your peers think about Zadara. Get advice and tips from experienced pros sharing their opinions. Updated: December 2022.
670,331 professionals have used our research since 2012.
it_user594567 - PeerSpot reviewer
Technological Director. Big Data, Advanced Analytics and Geospatial Systems at a tech services company with 1,001-5,000 employees
Jan 25, 2017

IBM doesn´t support deduplication in V9000 yet. It´s in roadmap for this 2017. As Yannis said, you can run Comprestimator (free tool) in servers attached to LUNs and check the compression/Thin Provisioning ratio achieved. Ratios achieved using this tool are like a contract, IBM guarantee this ratios in a SVC or V9000 real deployment.

it_user594891 - PeerSpot reviewer
Infrastructure and Database Architect at PB IT Pro
Jan 25, 2017

Only thing I've heard of the V9000 doing well is latency (microsecond) vs XtremeIO's "sub-millisecond" latency. IBM's previous compression is based on RISC cards and compression alone was about 2.3-3:1 for my RAC.
Nimble's compression + dedupe works very well (we moved off of an older IBM 840) for us at 2.4:1. Features alone (0 copy clones, snapshots, etc) increased our efficiency to give up some latency. http://info.nimblestorage.com/rs/nimblestorage/images/top-ten-reasons-nimble-storage-for-oracle.pdf If you dedupe your RAC, you really need to buy multiple Nimble arrays to maintain fault zones (if that was the goal of your RAC).

it_user208149 - PeerSpot reviewer
Presales Technical Consultant Storage at Hewlett Packard Hellas Ltd.
Jan 25, 2017

I totally agree that it is difficult to estimate the reduction level. And i totally agree that with Mr. Martin Pouillaude that several factors can can influence the reduction level. What i would recommend is that either an IBM Business Partner or an IBM Engineer to run a tool/utility that IBM has and is called Comprestimator. This tool/utility will provide a more accurate prediction of the reduction level. This tool is not installing something on production Oracle. It just analyzes the filesystem and uses the same algorithms that are used when compression is running on the hardware (V9000). The numbers that the customer will get from Comprestimator are the values that IBM can commit in written as well.

it_user572694 - PeerSpot reviewer
Regional Sales Manager at Pure Storage
Jan 25, 2017

It’s always hard to predict precisely the data reduction level that can be obtained on Oracle databases, one key factor is to understand if the database is already compressed or even encrypted, if this the case the 4.5:1 ratio would be extremely hard to achieve.
IBM’s documentation state a reduction factor of 4:1 for database :
A 4.5:1 number seems a bit on the optimistic range. Generally I would recommend to do a POC on the production data if possible to understand the accurate number, and of course not count Thin Provisioning as data reduction.

Related Questions
Ben Amini - PeerSpot reviewer
Chief Executive Officer at Robin Trading Company
Feb 4, 2022
Hi community professionals, I work at a small tech services company and in my position, I have to provide solutions in the IT infrastructure area to our customers.  In some cases, I analyze their needs and tell them: "You can fulfill your compute and storage demand with this product". But, sometimes I'm getting responses such as:  "No, we are an enterprise company and we can't use this produc...
See 2 answers
Manager at a financial services firm with 1,001-5,000 employees
Feb 4, 2022
Hmm, this is tough. You wouldn't have enough granularity to get you close to an optimized solution. Your best bet will be to document at least your top 10 requirements for compute and storage, then use that to identify plausible options to explore. So for compute, for example, what kind of processing do you do, will it be just for office work or server workloads; if server workloads, what type of workloads are they, and so on and so forth. For storage, what will be your needs, file or block, high transaction rate, replication requirements, do you need an object, so many questions.  Sorry, I can't be of more help, however, all I am saying is you need refined requirements.
President & CDS at Dragon Slayer Consulting
Feb 4, 2022
Ben, what you are talking about are advanced selling skills. A bit much to provide you an answer, so I will paraphrase as best I can. You need to understand in-depth, several aspects of the customer's problems you are attempting to solve from their perspective: 1. How are they solving those problems now? Make no mistake, they are solving them in some way. It may be with spit, chewing gum, and baling wire, but they're solving them. 2. Where does the current solution fall short and when does it hit the wall? 3. Why are other solutions inadequate, too costly, unsustainable, or all 3? 4. How is your solution a better fit in solving those problems now and in the future? The key is in the questions you ask. Remember none of us buy logically. We always make buying decisions emotionally and justify them with logic. No one buys on price. We reject on price when it does not match the value. Price is an issue when the prospect cannot distinguish between different solutions.  There are always different factors in the buying decision. Your organization's job is to show how other solutions will not achieve what they need or want based on all of their parameters, requirements, considerations, etc., but yours will. You must spend more time talking about the problem you're solving than the solution you're selling. You are driving their self-induced anxiety about their problem and making the devil they know worse than the devil they don't. As to the specific situation you described. Keep in mind that IT pros are very risk-averse. They will over-provision and overbuy to cover themselves. A typical IT philosophy is "it's better to have the resources you may not need than to need the resources you do not have." Remember, it's emotional. No one wants to be caught with their pants down. By over-buying, they give themselves headroom for the unexpected.  There are methodologies to deal with this. Several vendors have implemented cloud-like on-demand elasticity and pricing where the customer can automatically utilize more resources that are on their premises and only pay for what they use. Dell, HPE, Pure Storage, Infinidat, NetApp all have programs like this today. And more vendors are following. This is one of the driving forces behind public clouds. Sorry for being so long-winded here. This is a complicated discussion that takes more than a short answer. Good luck.
Jan 12, 2022
Hello, We're planning to offer Storage as a Service (STaaS) to our customers.  I'm looking for your recommendations on a solution for an enterprise-level storage environment from where we can offer this service. The environment should offer a unified storage environment and should be able to deliver all SAN, NAS, and object storage offerings. Thanks for your help!
2 out of 5 answers
Partial Owner at Storage One
Sep 20, 2021
NetApp AFF/FAS (or OEM Lenovo DM series), with no doubt, is the best answer.  SVM provides total customer isolation; it's able to run FC/iSCSI/NVMe SAN. CIFS/NFS NAS, object S3 from one box.  It's able to scale up and out. It is possible to combine All-Flash and Hybrid models in one scale-out cluster. Not supported to combine Lenovo and Netapp models in one cluster.  -Ontap Select for VMware ESX or KVM -Able to run on all the big 3 cloud hyperscalers.  Search for data fabric strategy.
Director at Apace Systems Corporation
Oct 2, 2021
You can check us on Apace Systems | www.apacesystems.com. Apace is best known for an intelligent storage platform for both micro and macro data, unified with content or media speciality intelligence… We do implement it On-Premise, Cloud, Edge or Hybrid with support to all the Cloud Services Globally…
Related Articles
Ariel Lindenfeld - PeerSpot reviewer
Director of Community at PeerSpot
Aug 21, 2022
We’re launching an annual User’s Choice Award to showcase the most popular B2B enterprise technology products and we want your vote! If there’s a technology solution that’s really impressed you, here’s an opportunity to recognize that. It’s easy: go to the PeerSpot voting site, complete the brief voter registration form, review the list of nominees and vote. Get your colleagues to vote, too! ...
Related Categories
Related Articles
Ariel Lindenfeld - PeerSpot reviewer
Director of Community at PeerSpot
Aug 21, 2022
PeerSpot User's Choice Award 2022
We’re launching an annual User’s Choice Award to showcase the most popular B2B enterprise technol...
Download Free Report
Download our free Zadara Report and get advice and tips from experienced pros sharing their opinions. Updated: December 2022.
670,331 professionals have used our research since 2012.