Speed, size, and security are the most valuable features.
This was the company’s first foray into storing data offsite. The old way of thinking was that this was too dangerous to contemplate and you can get hacked either way.
Speed, size, and security are the most valuable features.
This was the company’s first foray into storing data offsite. The old way of thinking was that this was too dangerous to contemplate and you can get hacked either way.
This solution, indirectly, allowed us to recover internal and expensive document storage and replace it with cheaper offline storage.
Capacity-wise, we’re looking at 200GB of transactional data in Redshift. More importantly, you have a lot of storage of other assets, some slow and some fast. These include document archives and web images. This is several years of documents of more than 500GB, and most of it will remain untouched.
That stuff ends up in S3 Glacier storage. It is not really that large in the grand scheme of things, but certainly does not warrant the use of expensive internal storage systems or hiding the data on backup tape somewhere.
The web interface was frustrating when dealing with large numbers of files. We ended up using an interface client (via FTP, I think) which also had its own issues.
How do you make it easy to manage 50K documents in one folder using a web interface? I guess some more advanced filtering and selecting capabilities would have been nice, but it was in the early days. It would only read about 120 files into the cache. If you wanted to remove 1000 out of 2000 documents, you had to continually repeat your actions.
This happened surprisingly regularly when you have a live data transfer that ships 100 files per cycle and does 20 cycles per hour. It could eventually delete them itself, but we didn’t have time to engineer that piece.
Amazon’s approach was to delete the old files after a certain number of days. That is money in the bank for them right there.
I have used this solution for three years.
There were no stability issues.
Technical support always met my expectations.
There were business intelligence solutions via the web. We had similar home-grown reporting applications running on in-house hardware for over 10 years prior, but this was directly impacting our ERP resources.
It was exceptionally simple to configure servers, although most of this was done by the boss.
Even though it appears cheap, be careful on how you use it. Optimizing early will save money spent on storage and resources long term, so make it part of the design process. The beauty is you can control it at a very fine level.
Follow the guidance. The documentation is excellent. Take the time to get it right.