Afternoon talks were brain tests. There were many good and interesting topics. I started from Sage’s librados talk. In addition to RGW, RBD, and CephFS, Ceph’s librados is also open to developers/users. Sage’s talk was to promote librados to app developers. In fact, it was the building block for RBD, RGW, and CephFS. He started with simple Hello Word type of snippets, then in a more complicated atomic compound and conditional models, following by models on K/V values (random access, structure data). The new RADOS methods run inside I/O path (a .so file) on a per object basis. This is very interesting, you can implement any plugins to add values to your data, e.g. checksum, archive, replication, encryption, etc. The watch/notify mechanism was extensively reviewed, this could implement cache invalidation on this. He mentioned dynamic object in LUA from Noah Wakins that used LUN clent wrapper for librados and made programming RADOS classes easy, VAULTAIRE (preserving all data points no MRTG, a data vault for metrics), ZLOG – CORFU (a high performancing distributed shared log for flash ???), radosfs (hey, not my RadosFS), glados (gluster fs xlator on RADOS), iRODS, Synnefo, and dropbox like app, libradosstriper. He concluded the talk with a list of others in the CAP space: Gluster, Swift, Riak, Cassandra.
Next talk on NFSv4.2 and beyond. Interesting to see the NFSv4 timeline, 12 years into production since working group created. But labeled NFS was much accelerated. Security labels were into RHEL 7 supporting SELinux enforced by server. Sparse file in kernel 3.18 but not in RHEL, it reduced network traffic by not sending holes, good for virtualization. Space reservation (fallocate) in 3.19, not in RHEL yet. Server side copy (no glibc support yet?), IO hint (io_advice). If you have an idea, supply a patch and RFC.
Last talk on Ceph today (4 in a row!) was from SanDisk. The 512TB InfiniFlash was mentioned. He explained a collection of patches to Ceph OSD to make all-flash OSD high performing 6~7x on read. Code in Hammer. He siad TCmalloc increased contetio in sharded thread pool. This was not in JEmalloc. My poor eye sight spotted a ~350K IOPS read with queue depth 100, and they were said to saturate the box (which was 780K IOPS and 7Gb/s).
Also during breakout, I peeked into Facebook’s storage box. A 30-bay 1U server, fan-only cooling (and still able to run without A/C!), no visible vibration reducer.