NAMD on GPUs vs CPUs
The benchmark was done for two various size proteins in rectangular water box (TIP3P). Both BSA (bovine serum albumin, 585 residues) and HEWL (hen egg white lysozyme, 129 residues) were initially neutralized with NaCl under 0.02M ionic strength. Then proteins were solvated and the final system size was 250,000 atoms for BSA and 80,000 atoms for HEWL. Before the GPU performance test the minimization and equilibration of the systems were ran, therefore the benchmark was ran for normal production trajectory.
Parameters used: timestep 2fs; VdW cutt-off 12A; PME; temperature 300K; rigid bonds and angles in water; 100,000 simulation steps
The same simulation was ran on 1, 2, 3, 4, 5, 6, 7 and 8 CPU and 1, 2, 3, 4, 5, 6 GPU nodes. Note, on ARCHIE-WeSt we have 12 cores per CPU node.
The plot below shows the wall clock running time (in seconds) of all simulations vs number of CPUs/GPUs used. The blue and the red line indicates the 250k atom (BSA) system on CPUs and GPUs respectively, while the green and purple lines shows the same for smaller system (80k atoms, HEWL).
The great benefit of using GPUs instead of CPUs is noticed when number of nodes is 1 or 2. With the big system of 250k atoms the running time on 1 CPU node is 35,000 s while on 1 GPU the running time is only 20,000s, so the speedup is by factor 1.75. If 2 nodes are used, the running time on CPUs is 18,500 while on 2 GPU nodes it takes 12,000 s (speedup 1.5). The running time on 3 nodes is almost the same for CPU and GPU and above 3 nodes the running time on GPU is even longer than on CPU.
Although in the case of smaller system of 80,000 atoms (HEWL) the benefit of using 1 or 2 GPUs is smaller than the above (because there is less non-bonding interactions which are the only ones calculated on GPUs), the general trend is the same and it is not worth to use more than 3 GPU nodes.
Real CPU time used (burn) by the above simulations are shown on the plot below:
It is clear that in the case of both systems (but for the big one it is more clear) less CPU time is used if the simulation is ran on 1 or 2 GPUs versus the same simulation ran on 1 or 2 then on CPU nodes. In the case of the small stem the difference in CPU time usage decreases with number of nodes increased and becomes equal in the case of 6 nodes. In the case of big system, the situation is more dramatic, the CPU time usage is much bigger if more than 3 GPU are used comparing with CPU nodes. It is also worth to note the CPU time usage is almost horizontal for both systems and does not depend much on number of CPU nodes used per simulation.
- bigger system, better benefit of using GPUs
- recommended number of GPU nodes per simulation is 1 or 2
- it is not worth to use more than 3 GPUs per simulation
- in the case of long time simulation (>50,000) is more efficient to use multiple CPU nodes (for example 10) rather than GPUs
- in the case of numerous, but short-time simulations is recommended to use 1 or 2 GPU nodes per run.
- In terms of the CPU usage it is not worth to use more than 3 GPU nodes
- Simulations on GPUs are more efficient for big systems, e.g. larger number of atoms in the system, better benefit of using GPUs
- It is not recommended to use GPUs for simulations of the system size ~50,000 atoms or less, reasonable system size to use GPUs is >100,000 atoms
For more details please contact firstname.lastname@example.org or directly dr Karina Kubiak-Ossowska (email@example.com), ARCHIE-WeSt user support officer.