OmniTI ~ Concepts of Cloud(ish) Storage

Concepts of Cloud(ish) Storage

Level:

This article reviews a fundamental concept or principle
This article reviews an intermediate concept or principle
This article reviews an advanced concept or principle
This article expresses an opinion or just a downright rant

It’s rare that I write an article simply to educate. Most of the time I am attempting to articulate or justify a position, or simply rebutting someone’s nonsensical yammering. For a refreshing change, I thought I would take some time to educate you on the fundamentals of large-scale data storage. Many people think of “storage as a service” (now being called “cloud storage”) as a magic black box. At the end of the day, it is just bits on disks. And like all things, if you use enough of it, you can more than cover the cost of managing it yourself by simply eliminating your vendor’s margin (insourcing).

There are more and more services providing outsourced storage. The concept is simple: you upload a digital asset to the vendor (via some sort of API or tool), they return an identifying key of some sort (sometimes this key is provided by you, the uploader) and they store the asset for you. To retrieve the asset, you use a similar method to the one used to upload. In the simplest terms, you can think of it as a mapped network drive to which you can save assets, and later reconnect to retrieve them.

By no means is this new technology. However, the idea of managing one’s own storage, combined with growing space requirements and fear of loss due to lack of redundancy, have driven people to want to make this particular problem someone else’s. Making this choice — to solve the problem yourself or to outsource — is always the outcome of several factors: cost, convenience, and safety.

Redundancy: The basics

Let’s take a look at the fundamentals of data storage. We all want our data to be safe. It’s pretty obvious that storing exactly one copy of the data isn’t safe, but it’s actually more complex than you would think — storing two copies doesn’t buy you much without taking a few extra steps.

Before we dive in and explore methods for keeping data safe across systems, we need to realize that one of our fundamental assumptions is invalid. We assume that when we write data to a disk, it will have no errors when we read it back. This assumption is fundamentally wrong. There’s this little evil thing called a bit error (basically, one of the zeros or ones that was written came back inverted). How often this type of error occurs is a probability called bit error rate (BER). The BER on modern spinning disks is usually around 10^-13 or 10^-14. Basically, for every 1 to 10 terabytes you write, one of the bits — when read — won’t equal what was written. A single erroneous bit might not matter for some types of data, but for others, such an error could be disastrous. We write a lot of data these days, and bit errors are silent, so the lesson here is: write checksums with your data.

The classic method of ensuring that data is safe is to store multiple copies on different physical media. Inside a single system, this can be accomplished with RAID1 (mirroring), which makes sure all data is on two physically separate disks. With a bit of (somewhat) clever math, we can take that same data, split it into a few pieces and store each piece on a different drive. We can then calculate a block of parity data, and store that on an additional drive. Retracing the same math backwards shows that we can lose any single disk in the set, and we’ll still be able to reconstruct our data. This is the basis for RAID5. Sometimes systems need to be resistant to multiple concurrent disk failures (hence the introduction of RAID6, which uses an erasure code such as Reed-Solomon).

None of these scenarios are designed to reduce the risk of data corruption. Rather, they were designed to prevent data loss due to hardware failure of one or more underlying disks. One issue with using RAID is that you are storing files on a set of drives, those files consist of chunks of data (blocks) which map to physical blocks of bits on the drives, and somewhere along that path we could lose our way. If a specific physical block goes bad, or somehow becomes unreadable, we can’t easily map it back to a logical object, such as a file. We only find out that there’s a problem when we try to read the object. Another problem with this general technique is that all of these disks live in a single system and if that system fails, all of the data is unavailable (or worse, lost).

So, RAID is designed to keep our data somewhat safe within a single system, but it doesn’t address system failures. The most obvious design is to put all of our information on two systems. There are pros and cons with this approach. On the positive side, once we’ve identified which system holds a copy of our asset, we only need to communicate with that single system to retrieve a copy of the asset — simplicity. The downside here is that we’ve used half of our storage as redundancy, and yet if two of our nodes fail, we’ve necessarily made unavailable (or permanently lost) 1/(N*(N-1)) of our assets. With two nodes, this works out to 100% (of course), and with 10 nodes, it’s around 2%.

Taking a different approach altogether allows us to use half of our storage for redundancy, while maintaining dramatically greater availability.

Erasure codes

High availability of assets in light of system failures is achieved by today’s peer-to-peer systems. Their technical description is clear-cut, yet extremely detailed. By using erasure codes, these systems are able to split data into many pieces (similar to RAID5), but instead of calculating simple parity, they calculate unique erasure codes.

Imagine we split our data into 5 pieces, and then calculate 5 additional pieces of data, any of which could be used to reconstruct any of the original 5 pieces were they found to be unavailable — these are erasure codes. So, with the data in 5 fragments + 5 erasure fragments, we’ve consumed twice the space but can now stand to lose any five pieces before the data becomes unavailable and/or lost. The main drawbacks to such a system are that calculating and distributing erasure codes is much more complicated than simply storing two copies of the same data, and that retrieving data requires contacting at least 5 machines to serve an asset.

This erasure code approach assumes a slightly larger network of servers. With two copies and 100 machines we see 99.8% availability with 5 machine faiures. With a 10 fragment (5 data + 5 coded) scenario, if 5 nodes fail, we maintain 100% availability. In the pathological case where 50 of our 100 nodes fail, the two-copy method would result in an availability of approximately 75.3%, whereas the erasure code method would achieve approximatately 98.7% asset availability.

Back to reality

In peer-to-peer systems, where clients enter and leave the network rapidly, the use of erasure codes for high redundancy is quite necessary. However, in a datacenter environment, with redundancy on each system and maintenance windows that we control, the situation is entirely different. Controlling the servers, their configuration and their region of deployment gives us a landscape on which we can build a sufficiently redundant system with all sorts of advantages.

Reduced system complexity and simple distributed processing are significant advantages that result from having whole data objects like images or documents present on a single node. With this model, we can offload some computational processing to the nodes that hold the data and they can act without consuming additional resources such as the CPU time and network bandwidth required to reconstitute whole objects from their distributed pieces.

At the end of the day, a hybrid/adaptive approach between the two would yield the best outcome. I see that being the next thing in distributed storage. Most of us that are faced with storing large amounts of data have already thrown traditional filesystems and POSIX-compliance to the wind and are looking for fresh, more appropriate solutions to our specific problems.

For now, until these merge, the approach of redundantly storing whole assets makes the most sense. It is simple and easy to build, deploy and administer. It is also trivially easy to understand and troubleshoot.

In This Issue…

Marketing Malware

23 Dec '09 from Mark Hammonds
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Internet registrar GoDaddy.com is notorious for two things: domain names and risque super bow…
Business Metrics Too

16 Dec '09 from Jason Dixon
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
When I began tinkering around with web services as a hobby, it was common to fiddle with an ap…
Transcending the Medium

15 Oct '09 from Mark Hammonds
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
I grew up in Boyd County, Kentucky, where John Deere tractors are practically an indigenous sp…
When Commodity Makes Sense

29 Sep '09 from Eric Sproul
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
We’d all like to spend as little money as possible to get the performance we desire from…
Stacking the Deck for Publishers

22 Sep '09 from Jon Tan
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Newspapers and magazines have a unique opportunity with online publishing. They have the bes…
Concepts of Cloud(ish) Storage

22 Sep '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
It’s rare that I write an article simply to educate. Most of the time I am attempting t…
What is Web Operations?

22 Sep '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
The field of web operations is one with which I am intimately familiar. For the last twelve y…
ORMs Done Right

22 Sep '09 from Clinton Wolfe
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Object-Relational Mapper (ORM) systems are one of the most contentious topics in database appl…
Virtualization, ZFS and Zetaback

22 Sep '09 from Mark Harrison
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
It used to be the case that when you wanted to deploy a new application you would need to buy …
Dissecting Today's Internet Traffic Spikes

22 Sep '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Today's Internet has changed quite a bit from the Internet I used to know. The Internet has a…
YSlow! to YFast! in 45 minutes.

22 Sep '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
The web is a complex beast. There are many moving parts involved in delivering a complete web…
RubyRep - Yet Another Tool For PostgreSQL Replication

19 Aug '09 from Denish Patel
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
One of the key features any enterprise considers when choosing a database technology for their…
Under the Hood

8 Jun '09 from Sherry Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
My perspective on the evolution of OmniTI is somewhat like that of a mechanic on a team of rac…
Increasing the Aperture on Security

13 Feb '09 from Jason Dixon
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Security is good. Security is necessary. Security is someone else's concern. Security is for o…
Embracing Failure to Rise Above Enterprise-Class Thinking

13 Feb '09 from Eric Sproul
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Failures in technical systems are inevitable. Drives die, network interfaces wink out, backho…
The Irony of Sun Database Technology

13 Feb '09 from Robert Treat
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
It's been just over a year since Sun announced it had agreed to purchase MySQL, the ever popul…
Custom Trending and the Benefits of Source Code Availability

13 Feb '09 from Mark Harrison
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
One of the self evident truths about system administration is that you need to know what is go…
Using Less is Green

13 Feb '09 from Theo Schlossnagle
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
Every time I hear about green computing I feel like there is a gap — an enor…
Oracle Flashback Versions Query

27 Jan '09 from Denish Patel
Level:
- This article reviews a fundamental concept or principle
- This article reviews an intermediate concept or principle
- This article reviews an advanced concept or principle
- This article expresses an opinion or just a downright rant
In database system, some times flashback for the older versions of the data or query is import…

Seeds 2009

Year:

Concepts of Cloud(ish) Storage

Redundancy: The basics

Erasure codes

Back to reality

In This Issue…

Marketing Malware

Business Metrics Too

Transcending the Medium

When Commodity Makes Sense

Stacking the Deck for Publishers

Concepts of Cloud(ish) Storage

What is Web Operations?

ORMs Done Right

Virtualization, ZFS and Zetaback

Dissecting Today's Internet Traffic Spikes

YSlow! to YFast! in 45 minutes.

RubyRep - Yet Another Tool For PostgreSQL Replication

Under the Hood

Increasing the Aperture on Security

Embracing Failure to Rise Above Enterprise-Class Thinking

The Irony of Sun Database Technology

Custom Trending and the Benefits of Source Code Availability

Using Less is Green

Oracle Flashback Versions Query

What We Do

Footer

Year:

Concepts of Cloud(ish) Storage

Redundancy: The basics

Erasure codes

Back to reality

Subscribe to Our Newsletter

Stay in Touch!

In This Issue…

What We Do

Footer