Container Orchestration on HPC Platforms

The last decade witnessed a new era of software development that allows software developers to write applications independently of the target environment by packaging them along with their dependencies and environment variables inside containers. Numerous studies [1-2] have shown that containers are optimal for building and running applications reliably on Read more…

Maintaining Storage Health over Time

Storage is a crucial aspect of HPC clusters. Storage nodes are typically shared among multiple users, and are highly utilized – there isn’t a lot of unused space. The storage nodes also experience a lot of reads and writes – data is typically written, deleted, and written again multiple times. Read more…

Monitoring in the Big Data era

Monitoring can be described as a three-step process, composed of collecting, storing, and alerting. Each of these steps is intrinsically simple and understandable by everyone: collecting is the process of gathering the necessary data, where this can be a temperature sensor, RAM usage counter, power consumption, or the number of Read more…