Date: September 22, 2022 | 3.00 p.m. (GMT+1)
Moderator: Miguel Viana, LIP
Virtual Manager (VM) is a component in the BigHPC implementation that aims to stage and execute application workloads optimally on one of a variety of HPC systems. It mainly consists of two subcomponents, ie. VM scheduler and VM repository.
The Virtual Manager Scheduler provides an interface to submit and monitor application workloads, coordinate the allocation of computing resources on the HPC systems, and optimally execute workloads by matching the workload resource requirements and QoS specified by the user with the available HPC clusters, partitions and QoS reported by the BigHPC Monitoring and Storage Manager components respectively.
Additionally, the Virtual Manager Repository provides a platform to construct and store the software services and applications that support BigHPC workloads as container images. It then provides those uploaded images in a programmatic way when a workload request is submitted to the Virtual Manager Scheduler for execution.
In this talk, we first a few possible approaches to designing Virtual Manager, then we discuss the pros and cons of each approach, and last we discuss the approach which we determined was most feasible and then adopted in the BigHPC implementation.
About the speakers:
Amit Ruhela works as a Research Associate in the HPC group at TACC, Austin. He has earned his Ph.D. degree in computer science from IIT Delhi and postdoc experience from The Ohio State University. His research interests are focused on feature and performance enhancements in MPI communication through novel and innovative designs. Amit Ruhela also has deep interests in Big Data, Machine Learning, Social Computing, and Information Systems.
John Cazes joined TACC in March 2005. Prior to TACC, he served as Outreach lead to Naval Oceanographic Office Major Shared Resource Center (DOD) users for Lockheed Martin. Currently, he serves as the director of the High Performance Computing group at TACC. He has over 20 years of experience in high performance computing in public and private industry. John Cazes relies on his background in HPC, astrophysics, and climate/weather/ocean modeling to support the wide variety of researchers on TACC resources. His primary research interests are parallel I/O and advanced architectures.
Stephen Lien Harrell is an Engineering Scientist at the Texas Advanced Computing Center in the HPC Performance and Architectures group. His research interests include performance portability, performance modeling, benchmarking and HPC metric capture. Before his current appointment Stephen worked as an HPC System Administrator and HPC Support Staff for twelve years and received his bachelors degree in Computer Science at Purdue University.
The BigHPC Project is co-financed by the European Regional Development Fund through the Operational Program for Competitiveness and Internationalisation – COMPETE 2020, the Lisbon Portugal Regional Operational Program – Lisboa 2020 and the Portuguese Foundation for Science and Technology – FCT under UT Austin Portugal.