The Portland Group, a wholly-owned subsidiary of ST Microelectronics have announced the general availability of the PGI Release 8.0 line of high-performance compilers and development tools for Linux, Mac OS X and Windows. PGI Release 8.0 includes full support for the recently announced OpenMP 3.0 multi-core parallel programming standard in Fortran and C across all supported platforms.
The new release also adds support for building and debugging of OpenMPI applications on both Linux and MacOS, complementing existing MPI capabilities on Linux and Windows clusters. PGI 8.0 users can now develop and deploy multi-core and parallel applications on any of the major desktop or cluster operating systems using identical PGI compilers, the latest OpenMP features, MPI implementation of choice and bundled OpenMP/MPI-capable debugging and profiling tools.
In a significant new development, PGI Release 8.0 also marks The Portland Group’s entry into the field of accelerated computing with provisional support for automatic offloading of parallel computations from x64 host processors to CUDA-enabled GPUs from NVIDIA.
Douglas Miles, director, The Portland Group said that together with PGI Unified Binary technology, which enables developers to leverage the latest CPU innovations from both AMD and Intel while treating x64 processors as a single platform, the new features in PGI 8.0 maximise flexibility and independence for HPC users and large multi-platform supercomputing centers.
New performance analysis tools:
In addition to building on a compiler and tools product line that now includes ideal practices HPC and multi-core programming technologies, the PGI 8.0 compilers include an all-new capability to automatically analyse source code, produce an extensive database describing performance optimisations that are possible or inhibited, and provide advice for modifying the source code to take advantage of the possible optimisations.
With Release 8.0, PGI has standardised the organisation and interface to this data through the Common Compiler Feedback Format (CCFF). PGI is publishing the CCFF standard and making access to it freely available in an effort to improve the utility and interoperability of PGI, third-party and research-community software tuning tools.
PGI’s PGPROF8.0 performance profiler displays CCFF data coupled with user source code in a logical, compact and intuitive graphical user interface (GUI). A command-line interface is also supported. Programmers can quickly and easily identify code segments that are already well-structured, as well as those that can be restructured to improve performance. In addition to identifying sections of an application that consume most of the compute time or system resources, PGPROF provides developers with specific actionable performance optimisation feedback about their source code.
The data, presented on a per-thread and/or per-process basis, simplifies performance tuning by identifying:
- Streaming SIMD Extensions (SSE) vector loops, and why vectorisation is inhibited on non-vector loops
- Loops auto-parallelised for multi-core, and why parallelisation is inhibited on serial loops
- Loops that are candidates for OpenMP parallelisation
- Compute intensity of loops, and candidates for offloading to a GPU (Graphics Processing Unit) or accelerator
- Loops with very small or very large iteration counts, and how they can be modified to maximise performance for SSE and the cache-based memory hierarchy
- Data prefetching, and opportunities for prefetch tuning using directives and pragmas
In addition to these detailed analyses, PGPROF also includes overall program level analyses including information about in-lined functions and subroutines and information about how each file was compiled, comprehensive system configuration information and many other performance-critical characteristics of Fortran, C and C++ source code.
Unlike traditional performance tuning tools which only report on and help tune performance for a specific type of processor or system, or focus solely on parallelisation, the PGI 8.0 compilers and tools provide developers with feedback and insight on how to restructure loops and algorithms to enhance performance on any modern multi-core x64 CPU or GPU accelerator.
Michael Wolfe, compiler engineer, The Portland Group said that parallelism does not equate to performance. The focus needs to be not on parallelism, but on performance, where parallelism is one of the tools to get it.
Provisional GPU support
PGI Release 8.0 also includes a technology preview of the industry's first Fortran and C compilers that automatically offload computations from an x64 host program to a GPU. Until now, C and C++ developers targeting GPU accelerators have had to rely on language extensions to their programs.
Use of GPUs from Fortran applications has been extremely limited. x64+GPU programmers have been required to program at a detailed level including a need to understand and specify data usage information and manually construct sequences of calls to manage all movement of data between the x64 host and GPU.
Using the provisional support in PGI Release 8.0, programmers can accelerate Linux applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing high-level standard-compliant Fortran and C programs and then recompiling with appropriate compiler options.
Andy Keane, general manager, Tesla computing solutions, NVIDIA, said that PGI is joining the increasing number of software publishers offering innovative approaches to harnessing the power of NVIDIA GPUs by leveraging the CUDA development environment. With their 20 year history and track record of success, they expect PGI’s offering will open the door for members of the HPC community to begin incrementally porting large legacy production science and engineering codes to take full advantage of NVIDIA Tesla accelerators.
The PGI 8.0 x64+GPU compilers automatically analyse whole program structure and data, split portions of the application between the x64 CPU and GPU as specified by user directives, and define and generate an optimised mapping of loops to automatically use the parallel cores, hardware threading capabilities and SIMD vector capabilities of modern GPUs.
In addition to directives and pragmas that specify regions of code or functions to be accelerated, the PGI Fortran and C compilers support user directives that give the programmer fine-grained control over the mapping of loops, allocation of memory, and optimisation for the GPU memory hierarchy.
The PGI compilers generate unified x64+GPU object files and executables that manage all movement of data to/from the GPU device while leveraging all existing host-side utilities linker, librarians, makefiles and require no changes to the existing standard HPC Linux/x64 programming environment.
Another significant new feature included in PGI Release 8.0 is support for OpenMP parallel and local OpenMPI parallel debugging in Mac OS X, new simplified licensing setup on Microsoft Windows, support for Microsoft HPC Server 2008 clusters and support for the latest processors from AMD and Intel. Evaluation copies of the new PGI compilers are available from The Portland Group web site.