CLIMB-BIG-DATA | Cloud Infrastructure for Microbial Bioinformatics

SYSTEM HIGHLIGHTS

The CLIMB-BIG-DATA computational infrastructure relies on virtualisation, where physical computing is re-purposed into a scalable system of multiple independent virtual machines (VMs) run on OpenStack (a free open-source platform for cloud computing), with access to the CEPH platform to implement object storage. Through a user-friendly web portal Bryn , users gain instant, free access to their own VMs, preconfigured for microbial genome analysis with powerful but user-friendly resources such as the Genomics Virtual Laboratory and Galaxy. CLIMB-BIG-DATA users gain root access on their VMs, so that they have been able to install their own software. Over the last five years, our users have fired up over 4900 VMs!

7680 vCPU Cores

The CLIMB system is composed of over 7,500 CPU cores of processing power. This makes it probably the largest single system dedicated to Microbial Bioinformatics research, anywhere in the world.

500 Local Storage (TB)

To provide users with local, high performance, storage we have deployed IBM GPFS in each of the 4 sites, to provide 500TB of local storage. This storage is connected to our servers using Infiniband.

78 Total RAM (TB)

Unlike most supercomputers, the CLIMB-BIG-DATA system has been designed to provide large amounts of RAM, in order to meet the challenge of processing large, rich biological datasets.

1000 Virtual Machines

The CLIMB-BIG-DATA system provides a pool of CPU cores and RAM for microbial Bioinformatics research. The system has been designed to support over 1,000 VMs running simultaneously, supporting most of the microbial bioinformatics community within the UK.

2304 Cross-Site Replicated Storage (TB)

For long-term data storage, to share datasets and VMs and to provide block storage for running VMs, we deploy a storage solution based on Ceph. Each site has 27 Dell R730XD servers, with each server containing 16x 4TB HDDs, giving a total raw storage capacity of 6912TB. All data stored in this system is replicated 3x, which gives us a usable capacity of 2304TB.

SYSTEM SPECIFICS

CLIMB-BIG-DATA is powered by OpenStack. We currently use Kilo in our production system
Our OpenStack System has been installed by OCF and is built using hardware supplied by IBM
Our Cross-Site storage system is based upon Ceph, using hardware supplied by Dell and integrated by OCF and Redhat
Our networking is provided by Brocade. Our servers are connected using Brocade VDX switches, and our sites are connected together using Brocade Vyatta virtual routers

CLIMB-BIG-DATA Hardware

IBM

Dell

Redhat CEPH STORAGE

POWEREDGE R730XD

OpenStack is an open source software platform for cloud-computing. The software platform controls large pools of processing, storage, and networking resources throughout a data center. Users access the resource through a web-interface. As OpenStack is open source software, anyone who chooses to can access the source code, make any changes or modifications they need, and freely share these changes back out to the community at large.

Ceph is a scalable, software-defined storage platform delivering unified object and block storage making it ideal for cloud scale environments like OpenStack. It uses an algorithm called CRUSH (Controlled Replication Under Scalable Hashing) to ensure that data is evenly distributed across the cluster and that all cluster nodes are able to retrieve data quickly without any centralized bottlenecks.

GPFS is IBM’s parallel, shared-disk file system for cluster computers. It provides high performance by allowing data to be accessed over multiple computers at once. GPFS provides higher input/output performance by “striping” blocks of data from individual files over multiple disks, and reading and writing these blocks in parallel. GPFS provides for incredible scalability, good performance, and fault tolerance (Ie: machines can go down, and the filesystem is still accessible to others).