The impact of link-time optimization

Recently, I compared running times with and without link-time optimization. Link-time optimization (LTO) makes programs run faster and it must be done at compilation. Both the GNU Compiler Collection (GCC) and Clang (a frontend for LLVM) call this link-time optimization. The Intel compiler can also do that in a similar process called interprocedural optimization (IPO). This process is called Whole Program Optimization (also needs Link-time Code Generation -- LTCG) in Microsoft Visual Studio

The dataset I used for these tests is ERS006494. It contains 186073978 Illumina DNA sequences. The length of these sequences is 75 nucleotides.

I used 8 nodes, each with 2 Intel Xeon X5560 processors (8 cores per node) and 24 GiB of memory. Storage was served by a Lustre file system. Each job therefore had 64 ranks. The version of GCC was 4.7.2. The version of Open-MPI was 1.6.3. The version of Ray was 606be2a7a710a226. The version of Ray Platform was d78e7ec5037c9c9e8a0. The Host Communication Adapter was Mellanox Technologies MT26428.

The command stript was used to remove useless information from Ray executables. The complete command for link-time optimization with GCC is available here. The template to run the jobs is available here.

Table 1: Comparison of running times with different compilation options.

Compilation options
Running time
-Wall -std=c++98 -O3 -march=native
7 hours, 14 minutes, 33 seconds
-Wall -std=c++98 -Os -march=native -flto -fwhole-program
10 hours, 39 minutes, 26 seconds
-Wall -std=c++98 -O3 -march=native -flto -fwhole-program
7 hours, 8 minutes, 36 seconds


 There is no difference with LTO when running Ray on Infiniband apparently.



Comments

Popular posts from this blog

Le tissu adipeux brun, la thermogénèse, et les bains froids

My 2022 Calisthenics split routine

Adding ZVOL VIRTIO disks to a guest running on a host with the FreeBSD BHYVE hypervisor