Participate in PPerfLab research in developing performance tools for medium to large scale HPC workflows. We are developing monitoring tools to provide useful feedback to developers to guide them in addressing performance issues in their code, particularly related to workflows (applications that comprise some number of separate applications, libraries, and/or platforms) and data movement. This development requires testing various codes and gathering performance measurements with a number of different performance tools.
Job Description
This work will entail conducting experiments with MPI- and OpenACC-based codes, possibly in combination with python scripts and a visualization tool. You will be learning and using a variety of tools to evaluate the runtime performance. The particular focus will be the data movement and storage. The student will receive training in using a shared cluster environment, using the Lustre parallel file system, and using a variety of development tools. Also additional training if needed for writing simple MPI and OpenACC - based codes. One application of focus will be a drought prediction code developed at Portland State in a collaborative research project.
Computational Resources
Use of XSEDE Resources This project will use time on Linux clusters, on the PSU Coeus cluster, and on machines in the PI's laboratory. Our lab has only a small older 16-node Linux cluster with a "mini-Lustre" installation for initial learning. We hope to use SDSC's Comet or a similar XSEDE resource to allow the student to learn how to run science codes at the medium scale using SLURM or a similar resource manager and state of the art performance tools.
Contribution to Community
This project will provide direct HPC training and experience to two undergraduate computer science students. They will be part of our PPerfLab research group and will be mentored by both the PI and the group's Ph.D. students. We will encourage them to present their efforts in a poster.
Position Type
Apprentice
Training Plan
The students are encouraged to take Parallel Programming or Introduction to Performance in Winter quarter, if that is possible then they already will have some exposure to MPI and the basics of parallel computing and performance. Our first step will be to get them hands on practice running a set of codes we provide, with some measurement tools. This will require a learning step for the performance tools. Once the learning curve is achieved, they will do actual runs to collect data we need for our research, first on our own facilities, and then, if they have been fast learners, on a larger remote facility. They will be welcome to contribute ideas for new features we might develop for tools. The PPerfLab graduate students will participate in mentoring and training the undergraduates.
Student Prerequisites/Conditions/Qualifications
Students must have completed a course in operating systems (at PSU this is CS 201 and CS 333) and be able to program in C/C++ in a linux environment. They must have good English communication skills both written and oral to participate in the research group meetings and prepare a poster for their work.
Ideally the students will have taken either Parallel Programming or Introduction to Performance.