When they present a block device to guest VM, hypervisors (ESX and QEMU/KVM) usually emulate the block device on top of the host filesystem, illustrated roughly as the following:
Architecture wise, this is a clear approach: it separates out storage from hypervisors.
However, as pointed out , for write-most and latency-sensitive workload, this architecture delivers suboptimal performance (as low as half of wire speed).
There are some projects aiming for better performance:
- VirtFS. VirtFS essentially bypasses stacks in guest VM. This project, however, appears inactive.
- Ploop. Strictly speaking, Ploop is not designed for hypervisors. It nonetheless possesses some of the solutions: layout aware, page cache bypass, etc.
- As in , a special handling of dirty pages using a journaling device. This helps some (mostly write-most) workload.
- TCMU userspace bypass This could be a more general framework. It needs a performance focus though.
Similarly, object storage that use file store also have concern for performance loss in nested filesystems layers. Ceph is exploring directions on newstore and seemingly favors owning the block allocation, thus bypassing file store.
- “Understanding performance implications of nested file systems in a virtualized environment”, D Le, H Huang, H Wang, FAST2012
“Host-side filesystem journaling for durable shared storage”, A Hatzieleftheriou, SV Anastasiadis, FAST2015