The system will focus on features relevant to genomics researchers with features such as huge data storage capabilities, very high-memory research servers for maximum performance and integration with relevant biological databases.
The CLIMB system is composed of over 7,500 CPU cores of processing power. This makes it the largest single system dedicated to Microbial Bioinformatics research, anywhere in the world.
To provide users with local, high performance, storage we have deployed IBM GPFS in each of the 4 sites, to provide 500TB of local storage. This storage is connected to our servers using Infiniband.
Unlike most supercomputers, the CLIMB system has been designed to provide large amounts of RAM, in order to meet the challenge of processing large, rich biological datasets. In comparison, the Spruce B supercomputer at the Atomic Weapons Research Establishment (number 68 on the Top 500 Supercomputers list, November 2014) has 35,000 cores, but only has 110 TB of RAM.
The CLIMB system is not designed to provide a single HPC system, as is often the case within academic computing; rather, the CLIMB system provides a pool of CPU cores and RAM that Medical Microbial Bioinformatics researchers can gain access to. The system has been designed to support over 1,000 VMs running simultaneously, potentially supporting most of the Microbial Bioinformatics community within the UK.
For longer term data storage, to share datasets and VMs, and to provide block storage for running VMs, we will be deploying a storage solution based on Ceph. Each site has 27 Dell R730XD servers, with each server containing 16x 4TB HDDs, giving a total raw storage capacity of 6912TB. All data stored in this system is replicated 3x, which gives us a usable capacity of 2304TB.
OpenStack is an open source software platform for cloud-computing. The software platform controls large pools of processing, storage, and networking resources throughout a data center. Users access the resource through a web-interface. As OpenStack is open source software, anyone who chooses to can access the source code, make any changes or modifications they need, and freely share these changes back out to the community at large.
Ceph is a scalable, software-defined storage platform delivering unified object and block storage making it ideal for cloud scale environments like OpenStack. It uses an algorithm called CRUSH (Controlled Replication Under Scalable Hashing) to ensure that data is evenly distributed across the cluster and that all cluster nodes are able to retrieve data quickly without any centralized bottlenecks.
GPFS is IBM’s parallel, shared-disk file system for cluster computers. It provides high performance by allowing data to be accessed over multiple computers at once. GPFS provides higher input/output performance by “striping” blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel. GPFS provides for incredible scalability, good performance, and fault tolerance (Ie: machines can go down, and the filesystem is still accessible to others).