Quantum Mechanical Simulation of Impurities and Defects Embedded in Materials for Quantum Information Technology

Schematic representation Copyright: © Philipp Rüßmann, Forschungszentrum Jülich

Scientific Area

Quantum mechanics, density functional theory

Short description

The JuKKR package is developed at Research Center Juelich and contains a collection of codes for electronic structure calculations based on the Korringa-Kohn-Rostocker-Green's function method. One part of JuKKR is the KKRimp code, which allows the quantum mechanical simulation of impurities and defects embedded in a variety of materials. This allows unparalleled insights into their electronic and magnetic properties. Moreover, it allows to describe scattering of electrons off defect atoms. Recently, the code has been extended to describe defects embedded in superconductors which is an important ingredient in the search for materials that are useful for future stable quantum computing applications.

We conducted a performance assessment of the KKRimp code using the tools Score-P, Scalasca, Cube and Vampir. As part of the assessment, we calculated metrics for hybrid parallel applications as proposed by the POP2 project [3]. Using the POP-metrics we identified parts of the code that show potential to optimize the existing OpenMP parallelization.

One candidate for optimization is the routine rhooutnew, which calculates an integration of density matrices over a discrete set of radial points. The whole set of radial points is already distributed over each MPI process. However, the hybrid OpenMP parallelization is limited to calls of linear algebra routines, e.g., zgemm, to the parallel version of the Intel Math Kernel Library (MKL). Thus, a significant part of this routine is still executed in serial by the main thread of each MPI process, leading to an inefficient utilization of the allocated computing resources.

To optimize the performance of the code, we changed the OpenMP parallelization of this routine such that the integration loop over all radial points is parallelized with OpenMP and not only parts of it. At the same time all calls of linear algebra routines like zgemm to the parallel Intel MKL are replaced by the serial version to avoid nested parallelism of the OpenMP threads.

Results

As base for our performance evaluation, we use the original code, in which the OpenMP parallelization of the integration in the routine rhooutnew is only achieved by using calls to the parallel Intel MKL. The code was compiled using the Intel Fortran Compiler 2021.5.0 and Intel MPI Suite 2021.3.1 and executed on the CLAIX-2018 cluster. For execution we chose 16 MPI processes and a varying number of OpenMP threads per process. The following figure shows a comparison of the runtime of the rhooutnew routine between the base version (ref) and our optimized version (opt).

Bar chart with the runtime of different OpenMP thread counts per MPI process

Our optimized OpenMP parallelization of the integration loop in the routine rhooutnew is approximately 70% faster compared to the original implementation. In total we achieved a speedup of the whole KKRimp application by approximately 5%.

Code Details

Lines of code: ~130,000

Programming languages & models: Fortran, MPI, OpenMP

Development & Cooperation

Dr. Philipp Rüßmann, Institute for Advanced Simulation, Forschungszentrum Jülich

Fabian Orland, IT Center / Chair for High-Performance Computing, RWTH Aachen University

Reference

[1] Bauer D.S.G. (2013). Development of a relativistic full-potential first-principles multiple scattering Green function method applied to complex magnetic textures of nano structures at surfaces. PhD Thesis.

[2] Rüßmann P. and Blügel S., Density functional Bogoliubov-de Gennes analysis of superconducting Nb and Nb(110) surfaces, Phys. Rev. B 105, 125143 (2022). doi: 10.1103/PhysRevB.105.125143

[3] Performance Optimisation and Productivity (POP), Learning Material / Documentation, last accessed: 16th January, 2023