Helping to Weather the Storm: Climate Scientist Mentors GPU Hackathon Teams

Matthew Norman
Authored by
Publication Date

 

*Image captured prior to COVID-19 related mask mandates.

In this profile series, we interview different mentors from across all walks of life - those who strive to solve the greatest challenges of our time, who work to spearhead technology advancements, and who collaborate with the developer community to enable scientific discoveries. 

Interested in becoming a mentor? Apply today.

Meet Matthew Norman, a computational climate scientist leading the Advanced Computing for Life Sciences and Engineering Group at Oak Ridge National Laboratory (ORNL) working on numerical algorithms and GPU programming for numerical climate models. He serves as a liaison for weather and climate model developers using Oak Ridge Leadership Computing Facility (OLCF) resources, and a liaison for the Energy Exascale Earth System Model (E3SM) awards under the INCITE program and through funding from the DOE Exascale Computing Project. Matthew holds a Ph.D. in Atmospheric Science from North Carolina State University under the U.S. Department of Energy’s Computational Science Graduate Fellowship (CSGF).

How did you get started in computational science?

I received my doctorate from North Carolina State in atmospheric science through the Computational Science Graduate Fellowship, a Department of Energy program that provides benefits and opportunities to students pursuing doctoral degrees in fields that use high-performance computing to solve complex science and engineering problems. Before that, I completed two degrees, one in computer science and one in meteorology, so I was interested in the combination of the two from the start.

My background is traditionally in algorithm development for the equations that govern weather and climate fluid flow, creating numerical methods that are a little more holistic and consider as many constraints as possible when optimizing the overall problems to get accuracy and efficiency. I'm engaged with a number of codes in the weather and climate and computational fluid dynamics (CFD) realm. I also enjoy creating training codes and articles to help people understand performance, and I have a number of examples on GitHub.

I’ve worked on GPU porting from the start, even when I was in graduate school, but the funny thing is the GPU was really an afterthought. I considered it an “add-on” to the more numerical work that had been done, but everyone seemed most interested and excited about the GPU porting from my dissertation.  I think the reason for that is similar to the reason why we have hackathons in the first place: There is a lot of mystery surrounding how to write code for GPUs and get the code to perform efficiently. They saw someone who had already done it, and I think that was more exciting than the algorithmic development.

How did you get involved with the GPU hackathons Program?

Simply put, I was asked to participate. The very first hackathon was held a few years ago at Oak Ridge Leadership Computing Facility (OLCF) and organized by a colleague of mine. She asked me to participate, and I found the idea of the hackathon interesting and was willing to help. 

What kind of benefits do GPU Hackathons offer to the community? 

The benefits of GPU Hackathons for the OLCF and any facilities that primarily rely on GPUs are obvious: The more people who are familiar or proficient on GPUs, then the more who come to our facilities and use our machines to get more science done.

It is not easy to clearly communicate to people how to efficiently run a code on GPUs. There is a lot of mystery surrounding them and I think most developers perceive a big barrier to entry. This is definitely something hackathons uniquely help with. They help provide hands-on interaction and expertise to overcome that barrier by providing developers with somebody to talk to one-on-one so they are more willing to consider trying out GPUs. Since GPUs can speed up a number of codes pretty significantly, we're helping them get more scientific results out of their computer allocations.

Another benefit to the community is significantly improved robustness of the compilers that occurred during these events. Especially during the early hackathons, it almost felt like a bug reporting festival. There were a ton of bugs reported for the compilers because the technologies were relatively new, and we were exercising them in ways that hadn't been exercised before. There were even times that a compiler developer would fix a bug in place and patch the compiler and send it back out during the hackathon event. 

Today, there are certainly fewer bugs, but it’s very different from when we first started, and the hackathon events are more varied. In some events, we have actual compiler developers present and others are at a higher level. Ultimately, the events are evolving as the users are evolving. Depending on who's present at the time of the event, the criteria for what is considered very critical to making any progress at the hackathon changes. 

What made you decide to continue as a GPU Hackathon mentor?

First and foremost, I like the GPU Hackathons. I think they are a fun environment to push productivity that you do not get in a normal workday. There is an immediate “intensity,” with some teams continuing well into the night sometimes to work through something, and I find the atmosphere really fun.

Second, I like the exposure to different domains and sciences that I never would have seen otherwise. I also like the exposure to different styles of coding. There's not just one way to write a Fortran or C++ code. There seem to be hundreds of ways, and everyone has a different “flavor.” The more exposure you have, the more you realize when a concept becomes general and when it's just specific to one style of coding.

I also like the collaborations that result from a hackathon event. There are times when I'll keep in touch with the teams after the fact.

More than anything, I do really enjoy taking something that another person views as complex and making it more approachable, down to earth and practical.

What are some of the challenges and successes as a mentor?

I have been a mentor for 10 or so hackathons, and one of the biggest challenges I’ve experienced is navigating through the team’s first realization of just how much work it's going to take to make the code run efficiently on a GPU.

Teams may come in hoping that some minor changes to their code will make it run efficiently on a graphics processing card. That can sometimes happen; but, especially in domains like fluid dynamics, it is rare. So, teams get an idea of just how much work is required and it can lead to a sense of disappointment. That is definitely a challenge for a mentor.

What mitigates that sense of disappointment is coaching the team so they can clearly understand what they need to do and how to get there in steps so they can feel the approach is more feasible. It is not a “giant workload;” it's workloads they now know how to do in steps or pieces, and they know what the payoff is. So,it becomes more attainable and worth it. That is one for the “success” column as a mentor.

For example, at one of the hackathons I mentored we had a code for fluid dynamics and I could see the moment when things finally “clicked;” the team realized exactly what had to be done to make the code run well on GPUs in their context. Instead of feeling disappointed, they seemed excited to go ahead and refactor the code because they were happy to have an understanding for how to get their code to run efficiently.

There are many cases where a team will have a well-tuned CPU code that performs nicely in cache and vectorization; then, they refactor for GPUs and realize that it harms CPU performance. My goal as a mentor is to help the team understand the trade-offs of one approach versus another and decide. It is an imperfect world when it comes to GPU porting and refactoring, and hard choices sometimes need to be made. However, if the team examines different approaches and ultimately decides on a pathway, they are happier knowing that they had the experience and feel validated in their final approach.

What are some of your key takeaways as a mentor for the GPU Hackathons? 

I feel that the skills I use most often in these environments are soft skills. 

One thing I’ve learned at my job in the Scientific Engagement Section at OLCF, where we liaise with projects that run under the INCITE allocation on our resources, is this: It takes a measure of trust to really work well with another team. This is true for the GPU Hackathons as well. It is not easy for a developer to hear advice from someone they do not know, and while the relationship can initially be difficult, focusing on building that trust and rapport early on is important. 

There are some easy ways to do that. You can build trust through conversation, going through ground level details of the code or project and clarifying things. You can build trust by helping a team address small things they are encountering so the team sees that your efforts are intended to be helpful for the project. Then the team is more likely to listen to your advice.

It is also really important to let the team stay in the driver's seat for their codes. You are there to be a resource to facilitate their decisions and help them determine what works best in their context; but ultimately the decision is the team’s because as a mentor you often do not know everything that's going on in their context.

Another takeaway is that it is essential that mentors require the team to do most of the work themselves. Hackathons are primarily training events—everything cannot be done all at once.  A mentor is there to give the team a picture of what the work steps will look like and have them leave understanding exactly what they need to do when they return to their institution.

I have had teams that asked for specific coding work, but I usually channel that request into digging through their code to find the breadth of situations they might encounter—reductions or atomics, race conditions, or data movement—and I help them understand the whole picture. When it comes to specific actions, I have the team handle it.

What advice would you then give to someone who wants to become a mentor at a GPU Hackathon?

Honestly, the key is just to be empathetic. As a mentor, you will typically have a team that has an idea what they want, even if they do not exactly know how to do it. There will be a meeting point where you and the group are no longer talking past each other; instead, you're discussing specifics about efficiency in GPU refactoring.

I would say: be empathetic, be understanding, and try to interface with them the best you can.

Any final thoughts?

Admittedly, it is a difficult landscape to live in these days with all manner of heterogeneous machines. While it can be discouraging at times, as a scientific and computational community we need to keep working towards that middle point where everyone can work in the same code base and understand what is going on, even if they must make sacrifices.

It is important to just keep charging towards the optimal point where there is good software engineering, good code readability and good performance.

Type

Author(s)

Izumi Barker
Izumi Barker

Izumi Barker is a program manager for GPU hackathons and bootcamps at NVIDIA and public relations director for OpenACC-Standard Organization, bringing more than twenty years of experience in communications, strategic marketing, and product management. Prior to her roles at NVIDIA and OpenACC Organization, Izumi held positions across multiple industries including University of Phoenix under Apollo Education Group, Cengage Learning, Bio-Rad Laboratories, Annual Reviews, Cystic Fibrosis Foundation, Ernst & Young, LLP, as well as several start-ups.