Date: September 22, 2022 | 3.00 p.m. (GMT+1)
Speakers: Amit Ruhela, John Cazes, Om Saran, Stephen Harrell (TACC & UT Austin), and Sangamithra Goutham (Systems and Storage Lab at UT)
Moderator: Miguel Viana, LIP
Virtual Manager (VM) is a component in the BigHPC implementation that aims to stage and execute application workloads optimally on one of a variety of HPC systems. It mainly consists of two subcomponents, ie. VM scheduler and VM repository.
The Virtual Manager Scheduler provides an interface to submit and monitor application workloads, coordinate the allocation of computing resources on the HPC systems, and optimally execute workloads by matching the workload resource requirements and QoS specified by the user with the available HPC clusters, partitions and QoS reported by the BigHPC Monitoring and Storage Manager components respectively.
Additionally, the Virtual Manager Repository provides a platform to construct and store the software services and applications that support BigHPC workloads as container images. It then provides those uploaded images in a programmatic way when a workload request is submitted to the Virtual Manager Scheduler for execution.
In this talk, we first a few possible approaches to designing Virtual Manager, then we discuss the pros and cons of each approach, and last we discuss the approach which we determined was most feasible and then adopted in the BigHPC implementation.
Check out the speakers’ presentation here.
About the speakers:
Amit Ruhela works as a Research Associate in the HPC group at TACC, Austin. He has earned his Ph.D. degree in computer science from IIT Delhi and postdoc experience from The Ohio State University. His research interests are focused on feature and performance enhancements in MPI communication through novel and innovative designs. Amit Ruhela also has deep interests in Big Data, Machine Learning, Social Computing, and Information Systems.
John Cazes joined TACC in March 2005. Prior to TACC, he served as Outreach lead to Naval Oceanographic Office Major Shared Resource Center (DOD) users for Lockheed Martin. Currently, he serves as the director of the High Performance Computing group at TACC. He has over 20 years of experience in high performance computing in public and private industry. John Cazes relies on his background in HPC, astrophysics, and climate/weather/ocean modeling to support the wide variety of researchers on TACC resources. His primary research interests are parallel I/O and advanced architectures.
Om Saran is a second year Computer Science Master’s student at UT Austin. He is collaborating with TACC as a research assistant to build the Virtual Manager for the BigHPC project. He is interested in systems and has previously worked at Nutanix. He is expected to graduate in May 2023.
Sangamithra Goutham is a second year Master’s student at The University of Texas at Austin. She works at the Systems and Storage Lab at UT as a Graduate Research Assistant and has previously completed her five year integrated Master’s degree at College of Engineering Guindy, Anna University. Her research interest lies in pursuing sustainability by optimizing resource consumption in large-scale systems
Stephen Lien Harrell is an Engineering Scientist at the Texas Advanced Computing Center in the HPC Performance and Architectures group. His research interests include performance portability, performance modeling, benchmarking and HPC metric capture. Before his current appointment Stephen worked as an HPC System Administrator and HPC Support Staff for twelve years and received his bachelors degree in Computer Science at Purdue University.
About the moderator:
Miguel Viana is a Linux SysAdmin at LIP-Minho. Graduated with a degree in Industrial Electronics Engineering and finishing a master degree dissertation in Embedded Systems. Interested in subject areas such as high-performance computing, automation, containerization and informatics security. Currently, mainly working on BigHPC project.
The BigHPC Project is co-financed by the European Regional Development Fund through the Operational Program for Competitiveness and Internationalisation – COMPETE 2020, the Lisbon Portugal Regional Operational Program – Lisboa 2020 and the Portuguese Foundation for Science and Technology – FCT under UT Austin Portugal.