About HPC

The acronym “HPC” is often used as a generic term, covering a variety of approaches to “bigger and faster computing”. But what is meant by bigger and faster?

For the purposes of the varied workflows within our Institutes, it is pertinent to separate HPC into two related approaches: High Performance Computing (HPC) and High Throughput Computing (HTC). However, the hardware installed here is multi-purpose, and both the clusters and the UVs hardware support both HPC and HTC computational approaches.

HPC = High Performance Computing

HPC = getting something done quickly, or ‘enabling the difficult’.

High Performance Computing requires a High Performance Computer system (hardware).

HPC (arguably) defines a computational approach using a system with one or more components that are significantly bigger or faster than those typically found in standard servers, in order to get a particular piece of work completed quicker. For example an HPC system may have faster processors (instructions per second), more memory or a faster bus or internal network. In some cases a particular problem may be practically impossible to solve without HPC hardware. e.g. large memory systems such as the SGI UV, make it possible to assemble large genomes (e.g. Wheat) using the De Brujin graph algorithm.

An analogy for HPC could be that of using a sports car on a race track, to get one from place to another. It will be much faster than other modes of transport, although it is specialised and will only carry a few people at a time.

HTC = High Throughput Computing

HTC = getting lots of things done.

High Throughput approaches are useful where you have many sets of input data, or separated streams of the same data, and you want to process as many as possible at a time. The amount of time each one takes is not of too much concern, but the sheer number of items to be processed means that a serial approach will not be able to work fast enough and a backlog will build up, so a parallelised throughput approach is taken. For example, processing the output of multiple lanes of a high throughput sequencer.

A corresponding analogy for HTC is that of many cars, buses and trucks on a multi-lane motorway. The purpose of a motorway is to achieve the maximum throughput of vehicles per hour, or even better, people per hour. Although each vehicle will take longer than a sports car to cover the same distance, there will be many vehicles per hour all travelling in a steady flow.

High Throughput Computing does not necessarily need high performance components, and an HTC system can be built from commodity hardware e.g. for a beowolf cluster. In practice though, many production grade clusters are built using well-specified servers that even individually would outstrip the performance of a commodity backoffice server. This will often be due to the type of CPU installed, e.g. with improved cache capacity, faster RAM, or other compute-friendly features.

HPC systems at the Norwich Bioscience Institutes

The clusters are HPC clusters of individual nodes each running Linux, with each node having more CPU cores and memory (RAM) than a regular server, and together they can support a variety of HPC and HTC workloads. The coupling (network) between nodes is 1Gbit ethernet.

Whereas, the UV HPC systems consist of many nodes coupled tightly together using an fast internal bus called NUMAlink. The system is presented as a Single System Image (SSI), i.e. one copy of Linux with hundreds of CPU cores and several terabytes of RAM. The UV systems were specifically acquired as HPC systems for large genome assembly, but they are also broad enough (CPU cores and internal coupling) to work very well with HTC workloads.