Datacenter Storage and Coordination

Large-scale datacenter applications are all too frequently bottlenecked by storage systems that are incapable of providing robust, scalable, easy-to-use storage. I have designed and lead the implementation of two robust, high-performance, scalable storage systems for datacenter-scale applications. The first system provides users with an intuitive hash table API and has obtained actual application read/write throughput in excess of ten million uncached I/O operations per second and a capacity of multiple petabytes while tolerating significant failures without losing user data. Moreover, this system provides a flexible, easy-to-use Linda-like coordination mechanism that enables programmers to develop parallel applications while avoiding the complexity and brittleness associated with traditional mechanisms such as MPI. The second storage system is a massively scalable distributed file system that enables files sourced from a traditional file server to be simultaneously accessed from tens of thousands of readers.

SeeĀ for full source.