Protecting Parallel File Systems from Harm

Modern supercomputers are establishing a new era in High-Performance Computing (HPC), providing unprecedented compute power that enables large-scale parallel applications to run at massive scale. However, contrary to long-lived assumptions about HPC workloads, where applications were compute-bound and write-dominated (such as scientific simulations), modern applications like Deep Learning training are Read more…

Convergence of HPC, AI and Big Data@MACC

HPC services are no longer solely targeted at traditional HPC applications, highly parallel modeling and simulation tasks. Indeed, the computational power offered by theses services is now being used to support data-centric Big Data and Artificial Intelligence (AI) applications. By combining both types of computational paradigms, HPC infrastructures will be Read more…

Container Orchestration on HPC Platforms

The last decade witnessed a new era of software development that allows software developers to write applications independently of the target environment by packaging them along with their dependencies and environment variables inside containers. Numerous studies [1-2] have shown that containers are optimal for building and running applications reliably on Read more…

Maintaining Storage Health over Time

Storage is a crucial aspect of HPC clusters. Storage nodes are typically shared among multiple users, and are highly utilized – there isn’t a lot of unused space. The storage nodes also experience a lot of reads and writes – data is typically written, deleted, and written again multiple times. Read more…