The use of high-performance computing (HPC) in data modeling, AI, and analytics has already significantly exceeded expectations. But the next decade will see explosive growth in capability and performance, achieved with special new processors and industry-standard software programmability.
As we battle global threats, from COVID-19 to global warming, achieving new heights in data analysis will enable new solutions to world problems. The next major milestone in High Performance Computing (HPC) is “exascale”: over one billion billion calculations per second. These new exascale systems combine the latest technologies in AI, engineering and science, to achieve performance faster than any supercomputer before. But perhaps most surprisingly, these computers will use technology from videogames to achieve this goal.
In 2019, Hyperion Research predicted 26 exascale and near-exascale HPC systems to be deployed between 2020 and 2025, with spending increasing to $39 billion in 2023. Hyperion also estimated that HPC applied to artificial intelligence (AI) would experience an almost 30 percent compound annual growth rate. The largest HPC systems are being created by government institutions and referred to as Exascale systems. Managing this computing power will pose a challenge.
Trip Down Memory Lane
Companies have been using a variety of processor architectures over the years. In the past, normal CPUs have been able to achieve high performance if enough of them are connected together in a supercomputer. But, over the last few years, graphics processors (GPUs, a technology from videogames) have become increasingly the only way of achieving very high levels of performance. Software for these complex compute systems needs to be written by scientists, so CPU systems already provided a standard C, C++ or Fortran programming approach.
Including GPUs into the system has been successfully achieved by Nvidia thanks to their proprietary programming environment CUDA™, which has now been adopted by many HPC programmers. But CUDA was not usable as a standardized programming model required in the US Department of Energy’s supercomputing projects. For that, OpenACC™ was created by Nvidia, Cray, CAPS, and PGI as an offshoot of OpenMP™ to offer a relatively open standard. OpenACC was still mainly driven by Nvidia to compete with OpenMP, the most popular HPC programming model at the time; though it too was aging in its support for the object-oriented programming style favored by C++. Over time, most of the OpenACC standard has been folded into OpenMP, especially when PGI was bought by Nvidia, CAPS folded, and in 2019, the director of Cray Supercomputing Center announced Cray was ending support for OpenACC.
Companies realized that none of these approaches were viable in the long term, especially in an era when many new processor architectures were available for accelerating Artificial Intelligence (AI) functions and increasingly demanding benchmarks were required. CUDA was widely adopted and providing an excellent solution, but it was not an open standard and therefore an alternative was required which would provide all the benefits of CUDA but with an ecosystem of developers and multiple hardware suppliers.
Forced Standard Created ‘Square Pegs’ Dilemma
Historically, supercomputers like Jaguar and Titan used a mix of CPUs and GPUs to reach high levels of performance in a supercomputer. But the programming model suffered loss of separation of concerns and could not remain stable. The problem was, building a mix of CPUs and GPUs in a supercomputer (called “heterogeneous processing”) creates.
So a new standard was created. But with heterogeneous computing, one can’t simply add a few annotations (called “pragmas”) in a few performance-critical areas of the software to solve inherent problems. One needs to think holistically to drastically improve performance and programmability. OpenMP and OpenACC, which were originally designed for parallel computing using multi-core, were heroically adapted for heterogeneous computing using the OpenMP pragma model but it was weak in the areas of separation of concerns, error handling, and object oriented resource management, which were in demand by the newest generation of code. The CUDA programming model failed to meet expectations because it was proprietary to Nvidia’s hardware and locked enterprises into a vendor.
As systems designers have come to realize, adding features into ISO C++ takes many years. Khronos SYCL™ fell into this niche and satisfied certain needs. It’s open, industry-defined and supports heterogeneous processing. And it can be written on tomorrow’s computers because it’s not controlled by any one company. It can work on systems like Intel’s GPUs and FPGAs, AMD’s GPUs, as well as Xilinx FPGAs. So getting to exascale computing can be achieved, regardless of hardware vendor.
Clearly Defined Standards Take a Stand
There is a momentum of movement towards SYCL, a Khronos® open-standard alternative, providing many of the standard C++ programming needs of scientists and other programmers.
Companies can now use the same standard programming model with non-Nvidia processors, as well as Nvidia GPUs. This sets a new high-water mark for standardized heterogeneous computing, ideal for enterprises that employ accelerated processors. The goal is to leverage the open standard, which has long been a requirement of high-performance computing as their workload can last 20 years while their hardware can change every five years. The industry as a whole needed to return to an open standard programming model and bring HPC back to an open platform.
ISO C++, despite recent fast ratification every three years, will still take a long time to adapt for heterogeneous computing. That effort is ongoing but likely will take 6-10 years before something fully usable is ratified. Many tried to do heterogeneous computing on C and other available libraries, but the feeling was that they needed to rely on something external, something open, and something scalable for the future. It may have worked internally or on one architecture. But as a standard programming model, it could not be relied upon.
The world of HPC is rapidly expanding. Some will adopt an in-house approach; others will seek refuge in cloud-based solutions. But the need for standards-based easy-to-integrate solutions will always remain a top priority.
For more information, visit www.codeplay.com.