Wednesday, July 11, 2012

What is object-based storage (and what it is not)

TL;DR Object-based storage is a term that categorizes the internal architecture of a file system, it is not a particular features set or interface. While the internal architecture of a file system has many implications for its performance and features, its outer appearance remains that of a file system.

We have often stressed the fact that XtreemFS is an object-based file system. While talking to our users, however, we have realized that this term causes more confusion than enlightenment. I blame this poor choice on our academic ignorance, and I hope I can clean up the confusion a bit. In the end, object is not a very descriptive term and most people associate it with object-oriented programming (totally unrelated) or the objects in Amazon's S3 system (only somewhat related).

In storage, an object is a variable-sized, but limited container of bytes. You probably wonder why this trivial concept deserves its own term and became relevant to the storage community at all. Well, this has mostly two aspects - first the name itself, then its main property, namely the fact that it is variable-sized.

Block, Blocks, Blocks

While storage hardware keeps a series of bytes, no storage hardware exports byte-level interfaces (disks, tapes, flash, even RAM). The reason is efficiency: addressing single bytes would yield long and many addresses (metadata overhead), but also reading and writing single bytes is inefficient (think checksums, latency, seeking, etc). The unit that is actually used is blocks, a fixed-size container of bytes.

File systems organize blocks into larger and variable-sized containers. This is also true for distributed file systems. As many distributed file systems do not run on bare hardware, they can actually chose a certain block size. There is wide range of file systems, where the block size for all files in the file system is fixed. Such a system is not very flexible: you need to chose a block size that fits all and in turn all you file sizes should have simliar size. There was a saying about Google's GFS (a block-based file system with a 64MB block size): it can hold any set of files, as long as they're large and not too many.

There is a second aspect of blocks shared between local and distributed file systems. Blocks are agnostic about files, ie. a block does not know which file it belongs to. While that's a no-brainer for local file systems, storage servers of block-based distributed file systems are somewhat degraded because they only store anonymous blocks. Only the metadata server knows how the blocks make up files.

Here come the objects

You can imagine the joy in the storage community when systems and standards arrived that allowed choosing a block-size per file. This innovation deserved a new term: object. Objects have also a second aspect that makes them great for distributed file system architectures: they raise the abstraction level a bit by making the storage server aware of the object's belonging. Objects are not addressed by a file-agnostic block identifier as blocks are, but by file identifier and sequential object number. This has many advantages for the architecture as storage servers can actually host file system logic (like for replication), which they are equipped for when they run on commodity hardware.

As I hinted earlier: objects are not super-relevant for the user, because all file systems make you work with files (XtreemFS even posix files). And Amazon's S3 objects are not the objects we are talking about here, because they are not size-limited. They are rather files without a hierarchical namespace.