Research Computing | HPC
Articles,  Blog

Research Computing | HPC

The purpose of this presentation is to provide
an overview of High Performance Computing which is abbreviated as HPC. For this video, I will present some
examples of applications that use HPC, then I’ll describe
what is an HPC computer, then I’ll talk about
Moore’s Law, Amdahl’s Law,
shared memory parallel programming, distributed memory parallel programming, and
then I’ll give my concluding remarks Some important applications that use HPC are:
car crash testing, diesel engine design,
jet engine design, genome mapping,
financial industry applications, and artificial Intelligence (AI) including deep
learning For all these applications, speed is critical. Instead of building a prototype car and crashing
it, the car industry simulates car crashes using HPC machines. This is much less expensive
than building prototypes and allows the testing of many designs in a short amount of time. HPC is also used in the design of diesel engines.
Cumins diesel has a complex application program that simulates the entire, running diesel
engine. Using HPC, they can perform one simulation in less than 4 hours. This allows them to
optimize power, cost, efficiency, pollution, etc. over thousands of design parameters. There are also complex, application programs
that simulate the running of a jet engine. These programs require HPC to run in a timely
manner. This allows jet engine designers to test many different designs and to optimize
over design parameters. Genomic mapping calculations are not possible
without HPC. HPC is critical for the financial industry
for high speed trading where there are thousands of transactions performed every second. Using
worldwide bond trading data, bond firms use HPC to predict future bond prices in a matter
of a few hours. This information is then used to buy and sell bonds. Today, Artificial Intelligence and Deep Learning
are active areas of research. For both, HPC is essential to be able to process very large
training sets in a reasonable amount of time. Speech recognition systems is a Deep Learning
application that also requires HPC. The design of HPC machines is continually
changing and evolving. There are many designs of HPC computers but most contain hundreds
to hundreds of thousands of compute nodes with each compute node having one or more
processors sharing a common memory and with each processor having multiple cores. Compute
nodes are interconnected via a high-speed communication network. Compute nodes may have
accelerators, such as an Nvidia GPU or an Intel Phi co-processor for added speed. Moore’s Law is commonly used to predict
the performance of future HPC machines. Moore’s Law is not a law but an observation made by
Intel co-founder Gordon Moore in 1965. He noticed that the number of transistors per
square inch on integrated circuits had doubled every 18 months since their invention. Moore’s
observation in 1965 has generally held true through today. Amdahl’s Law is a law governing program
speedup when using p processors for an application assuming no parallelization overhead. Suppose
an application takes 100 hours to run with a single processor and 95 hours can be perfectly
parallelized and 5 hours cannot be parallelized. Then the program execution time using p processors
is 5 + 95/p so the speedup=(serial time)/(parallel time) approaches 20 as p approaches infinity!
Because of this for many years people thought parallelization had limited usefulness. Thankfully,
for most applications as the problem size increases, the amount of time spent in serial
execution approaches 0. Special programs are needed to take advantage
of the multiple processors in HPC computers. Since each compute node has a shared memory,
one can use shared memory parallelization techniques, for example OpenMP. To use multiple
compute nodes, one must use distributed memory parallelization techniques, for example MPI
which stands for Message Passing Interface. MPI may also be used to parallelize programs
for shared memory computers. The common method used to write parallel programs
for shared memory computers is to use OpenMP. This is accomplished by inserting OpenMP directives
or pragmas into serial Fortran, C or C++ programs. For example, the “parallel do” directive
tells the compiler that different parallel threads can execute different iterations of
the loop. The following slide shows one way to parallelize a=b + c where a, b and c
are one dimensional arrays. In the following Fortran program, the “parallel
do” directive starts a parallel region and tells the compiler to parallelize the following
loop. The “end parallel do” directive ends the parallel region. The iterations of
the do loop will be divided among OpenMP threads and executed in parallel. Programs for distributed memory computers
usually use MPI routines for passing messages between processors. For example: insert mpi_send
and mpi_recv in a Fortran, C, or C++ program. To write an MPI program, one must first decide
how to distribute data and computation among processors to provide good load balancing
and to minimize message passing time. This example shows how to compute the global
dot product of two 1 dimensional arrays a and b distributed across p MPI processes.
Each MPI process executes the displayed program. After the local dot products have been calculated,
the value of the global dot product is placed on all p processes by calling mpi_allreduce. In conclusion, HPC is a critical technology
for many important applications. OpenMP is used to write parallel programs
for shared memory machines. A comprehensive OpenMP tutorial is available from Lawrence
Livermore National Laboratory. MPI is used to write parallel programs for
both shared memory and distributed memory machines. A comprehensive MPI tutorial is
available from Lawrence Livermore National Laboratory. I would like to thank the College of Liberal
Arts and Sciences for their support of this project.

Leave a Reply

Your email address will not be published. Required fields are marked *