Share Host’s Mount Namespace with Docker Containers

This is a follow-up to my previous post about using Super Privileged Container (SPC) to mount a remote filesystem on the host. That approach drew criticisms on hacking into mount helpers.

I made a Docker patch so that Docker daemon doesn’t isolate the host’s mount namespace from containers. Containers are thus able to see and update host’s mount namespace. This feature is turned on through a Docker client option –hostns=true.

A running instance is as the following:

First start a container and set –hostns=true:

#docker run --privileged --net=host --hostns=true -v /:/host -i -t centos bash

On another terminal, wait after the container is up and you get a bash shell, see the container’s mount namespace:

# pid=`ps -ef |grep docker |grep -v run|grep -v grep|awk '{print $2}'`; b=`ps -ef |grep bash|grep ${pid}|awk '{print $2}'`; cat /proc/${b}/mountinfo

And below, I spotted the following line, indicating the container and host share the same mount namespace.

313 261 253:1 / /host rw,relatime shared:1 - ext4 /dev/mapper/fedora--server_host-root rw,data=ordered

Then on the container’s shell, install glusterfs-fuse package and mount a remote Glusterfs volume:

# yum install glusterfs-fuse attr -y
# mount -t glusterfs gluster_server:kube_vol /host/shared

Go back to the host terminal and check if the host can see Glusterfs volume:

# findmnt |grep glusterfs |tail -1
└─/shared gluster_server:kube_vol fuse.glusterfs rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072

So far so good!

Manage Ceph RBD Device without rbd

There are lots of examples of using rbd(8) command to manage a RBD device, while there is less publicity that we can do the same by dealing with sysfs.

Some instructions can be found here. More detailed explanation of these parameters come from rbd kernel documentation. Our hero Sebastian bravely showed his usage. I also ventured to validate it on my Fedora 21 using my local Ceph container and a pool called kube, which contains an image called foo:

# echo "127.0.0.1 name=admin,secret=AQCw/W1VCOQFCRAAbRxkhg3TuCXRS42ols3hqQ== kube foo" > /sys/bus/rbd/add
# ls /dev/rbd/kube/foo -l
lrwxrwxrwx 1 root root 10 Jun 5 13:31 /dev/rbd/kube/foo -> ../../rbd2

A Tale of Two Virtualizations

In my previous post on Intel’s Clear Linux project, I had a few questions on how Intel got KVM move fast to match containers. Basically Clear Linux aims to bring hypervisors the first class citizen in container world.

Today I looked into another, yet similar technology called hyper. hyper establishes itself as hypervisor agnostic, high performing, and secure alternative to Docker and KVM. Love or hate it, hyper is able to run bother hypervisor and container in its own environment.

The architecture, as I peeked from the source, shares with Docker. A cli client interacts with a hyper daemon through REST. Daemon, by invoking QEMU and Docker engines, creates/destroys/deletes either VM or container. hyper understands Docker images (it uses Docker daemon API), QEMU (directly exec QEMU commands with well tuned configuration), Pod (appears similar with Kubernetes POD, except for the QEMU provisions).

hyper comes with hyperstart, a replacement to init(1), aiming for fast startup. To use hyperstart, you have to bake a initrd.

With these two similar initiatives of converging hypervisors and containers, I am now daydreaming of the near future when we don’t have to make trade-offs between VM and container in the single framework (KVM or Docker).