Coupling OpenFOAM with an Artificial Neural Network as a Reduced-Order Model for Reaction Kinetics

Darstellung Copyright: © T. Jeremy P. Karpowski, TU Darmstadt, STFS

Scientific Area

High-performance computing, computational fluid dynamics, machine learning

Short Description

The problem of global warming requires research of efficient and low emission combustion applications. Large Eddy Simulations (LES) of turbulent reactive flows are an important numerical tool to enable this research. However, modeling of non-linear reaction kinetics in reactive flows (CFD) is challenging. Simulations with highly resolved detailed chemistry are often unfeasible to apply to practical combustors due to prohibitive computational cost. One technique to reduce the computational effort is tabulated chemistry using flamelet-generated manifolds. However, this approach imposes severe memory limitations. To reduce memory usage and to enable the investigation of more complex cases, a data-driven modeling approach using machine learning is a promising alternative. In this approach an artificial neural network (ANN) was trained to learn the non-linear relationship between thermochemical control variables and the chemical source terms.

Integrating this ANN into a traditional simulation code requires coupling with technologies from artificial intelligence. One main challenge is to couple different hardware architectures.

Many traditional HPC simulation codes have been grown and optimized over years to run efficiently on parallel CPU architectures. In contrast, emerging data-driven technologies, especially deep learning, can often be significantly accelerated by GPUs or other specialized hardware. Thus, we developed a coupling approach that supports this kind of heterogeneous architectures by running a coupled application on a combination of pure CPU nodes with GPU accelerated nodes in a single job. We implemented a distributed controller-worker pattern, in which the MPI_COMM_WORLD communicator is partitioned into one individual workgroup communicator per GPU device available. Inside each workgroup communicator, one MPI rank is assigned to control inference of the ML model, while all remaining ranks are workers. These workers only execute CPU code of the main simulation. Only the controller ranks launch ML kernels on the GPU.

We implemented our coupling approach into a simplified OpenFOAM application. The following figure illustrates the resulting coupled simulation loop schematically. At the end of each simulation timestep, input data for the ML model is gathered by the controller ranks. Subsequently, the controller ranks perform the ML inference on their assigned GPU devices. Afterwards, each controller rank scatters the resulting output data back to the worker ranks within its workgroup communicator.

Schematic representation of the CFD-AI coupling

Results

As a base for our performance evaluation, we use the coupled OpenFOAM-ML application, in which ML inference runs only on CPUs by each MPI rank on its own part of the data. We investigate different machine learning frameworks on NVIDIA V100 GPUs: PyTorch 1.10.1, TensorFlow 2.6.0, Torch-TensorRT 1.0.0. Additionally, we also investigate NEC SX Aurora Tsubasa Vector Engines (VE) Type 10B as an accelerator device for ML inference using the VE backend of the SOL 0.4.2.1 framework. The coupled OpenFOAM-ML code is compiled with the Intel C/C++ compiler 19.0.1.144 20181018 using the standard C++ library shipped with GCC 8.2.0 and Intel MPI 2018.4.274.

The coupled application is executed as a heterogeneous job on our CLAIX-2018 cluster with a mix of compute nodes from different partitions. For each GPU-accelerated node we also allocate 1 pure CPU node. Each node has two CPU sockets. The following figure shows the resulting median inference times of 5 OpenFOAM timesteps for the investigated ML frameworks. The shaded area around the lines indicates the variance. Note that Torch-TensorRT does not support double precision data types. Thus, we measured with single precision and approximated the runtime of the inference for double precision data types as two times the runtime with single precision data types. The three frameworks running on GPUs perform very similar. ML inference on a single GPU is as fast as on 4 pure CPU nodes (8 CPU sockets). Overall, the ML inference can be accelerated using GPUs by up to 4x. To achieve the same speedup as with 1 GPU, 8 NEC Vector Engines are necessary.

Graph of inference times in different ML frameworks