Data Volumes for Containers

The last part of the trilogy is a survey of the current-in-market storage technologies that supply data volumes for Containers.

In one of the previous posts, I listed innovations and technologies sparkled by the introduction of Docker’s volume plugins. A rough categorization of the technologies are as the following.

  • Enablement. Obviously most of those volume plugins fall into this category. They connect various storage backend (Glusterfs, Ceph RBD, NAS, Cloud Storage, etc) to Containers’ mount namespace so Containers can store and retrieve data from these backends.
  • Data Protection. Some technologies enable data protection (backup, snapshot, and replication), a core storage function. I have yet to spot any innovation beyond the traditional data protection though.
  • Mobility. Some technologies assist containers’ mobility. Containers can relocate to new homes and find data right there. I am not so certain all these technologies work flawlessly though.
  • Provisioning. Rexray’s blog did a better job than I.
  • Multi-tenancy. Some claim they support,  such as BlockBridge. A demo video is available (thanks for Ilya’s information), it appears a block storage is created based on tenant’s credentials.
  • Security (or something like that). Is that omission my fault?
  • Performance. Any innovation out there that clip up Containers’ performance?
  • Isolation. Any innovation out there to keep noisy neighbors quiet?

In Kubernetes, volume drivers live in a similar dimension. We have achieved significant progress. We support many on-premise and Cloud Storage kinds. The list keeps growing. We are addressing issues that Containers users and infrastructure administrators care such as provisioning, security, multi-tenancy, etc. These technologies can help different Container Engine deployment (Docker, rkt, hyper, etc).

Advertisements

Storage Issues in Containers

(Continued from last post on Performance in Virtualized Storage)

Storage issues in Containers are somewhat different from those in hypervisors. Docker containers have two types: storage driver, used by container images, and volume driver, used by so-called data volume.

Storage drivers are responsible for translating images to Containers’ root filesystem. Docker supports device-mapper, AUFS, OverlayFS, Btrfs, and recently s3 storage drivers. Storage drivers usually support snapshot (though can be emulated) and thin provisioning.

Not all drivers are the same. My colleague Jeremy Eder has benchmarked extensively on storage drivers in his blog.

Most of the performance issues, also expressed in Problem 6 in this LWN article, are caused by (false) sharing: one container’s I/O activity will be felt by others in the shared underlying storage, aka noisy neighbor problem.

Naturally, solutions are invariably concentrating on jailbreaking shared storage.

IceFS, despite what the name suggested, is meant to work for hypervisors originally. Nonetheless, the idea is rich enough to shed light in container storage. IceFS provides physical and namespace isolation for hypervisor (and potentially container) consumers. Such isolation improves reliability and lessens noisy neighbor problem. I have yet to spot snapshot and thin provisioning for possible Docker adoption though.

SpanFS is like IceFS on isolation but more aggressive: I/O stacks are also isolated, and thus buffer allocation and scheduling for different containers are completely isolated (locks and noise? no more!). The result is astounding. Certain microbenchmark pointed to 10x faster than ext4.

Split-Level I/O is somewhat along that line too: I/O stacks are not only isolated but also tagged for each process/container/VM. Thus priority notation and resource accounting are well under control. This corrects priority inversion and ignorance caused by noisy neighbors.

Performance in Virtualized Storage

When they present a block device to guest VM, hypervisors (ESX and QEMU/KVM) usually emulate the block device on top of the host filesystem, illustrated roughly as the following:

hypervisor

Architecture wise, this is a clear approach: it separates out storage from hypervisors.

However, as pointed out [1], for write-most and latency-sensitive workload, this architecture delivers suboptimal performance (as low as half of wire speed).

There are some projects aiming for better performance:

  • VirtFS. VirtFS essentially bypasses stacks in guest VM. This project, however, appears inactive.
  • Ploop.  Strictly speaking, Ploop is not designed for hypervisors. It nonetheless possesses some of the solutions: layout aware, page cache bypass, etc.
  • As in [2], a special handling of dirty pages using a journaling device. This helps some (mostly write-most) workload.
  • TCMU userspace bypass This could be a more general framework. It needs a performance focus though.

Similarly, object storage that use file store also have concern for performance loss in nested filesystems layers. Ceph is exploring directions on newstore and seemingly favors owning the block allocation, thus bypassing file store.

Reference

  1. “Understanding performance implications of nested file systems in a virtualized environment”, D Le, H Huang, H Wang, FAST2012
  2. “Host-side filesystem journaling for durable shared storage”, A Hatzieleftheriou, SV Anastasiadis, FAST2015

List of Docker Volume Drivers

My bookkeeping exercise

Volume Driver

Supported Remote Storage Type

Source

Flocker

OpenZFS, EMC Scaleio, NetApp ONTP, etc

ClusterHQ

Ceph RBD

Ceph RBD

Yahoo

AcalephStorage

VolPlugin

EMC Rexray

EMC Scaleio, XtremIO, AWS EBS, OpenStack Cinder

EMC

Convoy

VFS, NFS

Rancher Lab

Glusterfs

Glusterfs

Docker

NFS

NFS

Docker

Azure File Service

Azure File Service

Microsoft

iSCSI

iSCSI

Phoenix-io, Blockbridge

FUSE derivatives

sshfs

keywhiz-fs

Many