Scientific Area
Injection molded plastic components, semicrystalline thermoplastics
Short Description
Injection molding is very important in plastics processing since it enables cheap and easily adaptable large-scale production of plastic components. As the microstructure of semicrystalline thermoplastics has a huge impact on the mechanical properties, the Institute for Plastics Processing at RWTH Aachen University (IKV) has developed the simulation software SphäroSim which can predict the microstructure of semicrystalline thermoplastics during solidification. Due to the long runtime of the simulation, high-resolution simulations are restricted to small areas. In 2014, SphäroSim has been parallelized to simulate big data sets on an HPC system [1], but algorithmic changes require further performance optimization. In the context of this project, we decreased the corresponding runtime and reduced the memory consumption. Now, the simulation runs on a single compute node without memory overflow for the currently used data set under investigation. Since the program has been developed under Windows, we also ported it to Linux to use the capacities of the Linux Cluster of RWTH Aachen University.
Our performance analysis with the tool Intel VTune Amplifier revealed functions which do not make use of the capacities of parallel computer architectures: We moved some instructions out of a big loop and removed some unnecessary Mutex which prevented parallelization. We also identified long idle times while writing intermediate data to disk. We reduced it by introducing asynchronous parallel IO with OpenMP tasks. Furthermore, we integrated an update of the used Eigen library and included a compiler flag for aggressive inlining. By analyzing the memory consumption of the application, we identified a very large data structure which allocated a few million memory pages but used only about a quarter of each page. Adapting the size of this structure to the actual used size enabled us to activate a precalculation step of frequently used data that has been implemented previously but which was restricted due to the lack of memory. Using the precalculation also reduced the runtime.
Results
As base for the performance evaluation, we use the code version Init that is parallelized with OpenMP and ported to Linux. It is compiled with the Intel 18.0 compiler and takes about 20 hours with 48 threads to complete on a CLAIX-2018 compute node. The following figure shows the runtimes after the optimization steps described above. All measurements are based on a single node of the CLAX-2018 compute cluster of the RWTH Aachen University. These nodes contain 48 Intel Skylake cores operating at a frequency of 2.1 GHz, and 192 GB of main memory.
Removing the Mutex and moving some instructions out of the loop (version rmSerMov) reduces the runtime by 30% (compared to Init). Introducing asynchronous parallel IO (version ParIO) saves another 10% (compared to Init). The update of the Eigen library and aggressive inlining (version Eig) reduces the runtime by additionally 5%. Enabling the precalculation (version Precalc) improved the runtime by another 37%: This version is 77% faster than the initial version and, now simulations only take roughly 5 hours to complete. This means that simulation runs can complete overnight.