class: center, middle Computing Discussions # De-Mystifying some C/C++ & Fortran compiler flags ### Things you should know about compilers -
whether you wrote the code yourself or not. Friday October 19th 2018 by Oliver Stueker .footnote[There's something for GROMACS users as well.] --- # Outline 1. Motivation 1. Example Code 1. The C-Pre-Processor: `cpp` 1. Preprocessor directives 1. What does that have to do with GROMACS? 1. The Linker: `ld` 1. Compiling just the Object 1. Linking by hand 1. Shared object dependencies 1. LD_LIBRARY_PATH 1. rpath 1. Putting it all together 1. Comparing `gcc` and `icc` 1. Conclusion --- ## Motivation > I've downloaded this code/program but when I try to compile it, > it gives this error that I don't understand. --- class: ### Example Code (1/2): ```C #include "hdf5.h" #include "hdf5_hl.h" #define RANK 2 int main( void ) { printf("Writing data...\n"); hid_t file_id; hsize_t dims1[RANK]={2,3}; int data1[6]={1,2,3,4,5,6}; /* create a HDF5 file */ file_id = H5Fcreate ("ex_lite1.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); /* create and write an integer type dataset named "dset" */ H5LTmake_dataset(file_id,"/dset",RANK,dims1,H5T_NATIVE_INT,data1); /* close file */ H5Fclose (file_id); ... ``` .footnote[Example based on https://support.hdfgroup.org/HDF5/Tutor/h5lite.html (`ex_lite1.c`)] --- class: ### Example Code (2/2): ```C ... printf("Reading data...\n"); int data2[6]; hsize_t dims2[2]; size_t i, j, nrow, n_values; /* open file from ex_lite1.c */ file_id = H5Fopen ("ex_lite1.h5", H5F_ACC_RDONLY, H5P_DEFAULT); /* read dataset */ H5LTread_dataset_int(file_id,"/dset",data2); /* get the dimensions of the dataset */ H5LTget_dataset_info(file_id,"/dset",dims2,NULL,NULL); /* print it by rows */ n_values = (size_t)(dims2[0] * dims2[1]); nrow = (size_t)dims2[1]; for (i=0; i<n_values/nrow; i++ ) { for (j=0; j<nrow; j++) printf (" %d", data2[i*nrow + j]); printf ("\n"); } /* close file */ H5Fclose (file_id); return 0; } ``` .footnote[Example based on https://support.hdfgroup.org/HDF5/Tutor/h5lite.html (`ex_lite2.c`)] --- class: ### First Try: ```console $ module load gcc $ module list Currently Loaded Modules: 1) nixpkgs/16.09 (S) 3) StdEnv/2016.4 (S) 5) gcc/5.4.0 (t) 2) imkl/11.3.4.258 (math) 4) gcccore/.5.4.0 (H) 6) openmpi/2.1.1 (m) Where: S: Module is Sticky, requires --force to unload or purge m: MPI implementations / Implémentations MPI math: Mathematical libraries / Bibliothèques mathématiques t: Tools for development / Outils de développement H: Hidden Module $ gcc hdf5_example_1.c hdf5_example_1.c:14:18: fatal error: hdf5.h: No such file or directory compilation terminated. ``` -- It can't find the `hdf5.h` header file. -- What does `#include "hdf5.h"` even mean? --- class: ### Intermezzo 1: ## The C-Pre-Processor: `cpp` > The C-Preprocessor `cpp` processes a number of directives > that are present in the source files. > > Modified versions if the source files are then passed to the compiler. **Examples:** -- ```C #include "hdf5.h"` ``` -- ```C #define RANK 2 ``` -- ```C #ifdef MPI #include "mpi.h" #endif ``` --- class: ### The `#include` directive ```C #include "hdf5.h"` ``` The content of the file `hdf5.h` is pasted into the source file at the location of the `#include` statement. -- #### When/why would I want to do that? -- * When you need to have the same lines of code in different source files. Only need to manage a single copy instead of multiple. * When a single source file gets too long. -- #### Example: C-Header files (`*.h`) describe the signature of public functions (API) so that the compiler can check whether the right arguments are passed (number, types) and whether there is a return value and if yes, of what type. -- * `#include "hdf5.h" ` ⇒ starts looking relative to the current file * `#include
` ⇒ starts looking in system directories --- class: ### The `#define` directive If the preprocessor encounters: ```C #define TABLE_SIZE 100 int table1[TABLE_SIZE]; int table2[TABLE_SIZE]; ``` ... it will pass it to the compiler as: ```C int table1[100]; int table2[100]; ``` Note: TABLE_SIZE is not a variable! -- #### Can also be used to define macros: ```C #define getmax(a,b) ((a)>(b)?(a):(b)) int main() { int x=5, y; y= getmax(x,2); cout << y << endl; return 0; }``` --- class: split-50 ### Conditional inclusions .column[ #### `#ifdef` and `#ifndef` Include file `mpi.h` only if name MPI has been defined: ```C #ifdef MPI #include "mpi.h" #endif ``` Using `cpp -DMPI` or `gcc -DMPI` will define `MPI`. ---- If `TABLE_SI ZE` has not been previously defined, define it as 100: ```C #ifndef TABLE_SIZE #define TABLE_SIZE 100 #endif int table[TABLE_SIZE]; ``` ] .column[ ---- #### `#if`, `#endif`, `#else` and `#elif` Truncate `TABLE_SIZE` to values between 50 and 200: ```cpp #if TABLE_SIZE>200 #undef TABLE_SIZE #define TABLE_SIZE 200 #elif TABLE_SIZE<50 #undef TABLE_SIZE #define TABLE_SIZE 50 #else #undef TABLE_SIZE #define TABLE_SIZE 100 #endif int table[TABLE_SIZE]; ``` ] --- class: ### What does that have to do with GROMACS? -- Ever noticed the following in your `topol.top` file? ```console [...] ; Include forcefield parameters #include "oplsaa.ff/forcefield.itp" ; Include chain topologies #include "topol_LDH.itp" ; Include water topology #include "oplsaa.ff/tip3p.itp" #ifdef POSRES_WATER ; Position restraint for each water oxygen [ position_restraints ] ; i funct fcx fcy fcz 1 1 1000 1000 1000 #endif [...] ``` -- And "`define = -DPOSRES`" line in your `.mdp` file? -- GROMACS uses Preprocessor directives within `grompp` to piece-together a "grand" Forcefield/Topology file (``.top`) from a number of smaller `.itp` (Included ToPology) files. --- class: ### So where the header files? ```console $ ll total 993 -rw-r----- 1 stuekero hdf5_example_1.c lrwxrwxrwx 1 stuekero HDF5_gcc -> /cvmfs/.../gcc5.4/hdf5/1.10.1 lrwxrwxrwx 1 stuekero HDF5_intel -> /cvmfs/.../intel2016.4/hdf5/1.8.18 ``` -- ```console $ ls HDF5_gcc/include/hdf5*h HDF5_gcc/include/hdf5.h HDF5_gcc/include/hdf5_hl.h ``` -- ```console $ gcc -IHDF5_gcc/include hdf5_example_1.c /tmp/ccEPK9Kf.o: In function `main': hdf5_example_1.c:(.text+0x5f): undefined reference to `H5check_version' hdf5_example_1.c:(.text+0x64): undefined reference to `H5open' hdf5_example_1.c:(.text+0x7d): undefined reference to `H5Fcreate' [...] collect2: error: ld returned 1 exit status ``` #### At least we now get a different error message. --- class: split-50 ### Summary of the C Preprocessor .column[ #### Preprocessor Flags: Flags can be used with `cpp` or the compiler (which then are passed to `cpp`) * `-I` - additional `include` directory * `-I/path/to/dir/include` * `-D` - define names on the command line (instead of `#define`) * `-DMYDEFINITION` * `-DNUMDIM=3` ] .column[ ---- #### Preprocessor Directives: * `#include` * `#define` * `#ifdef` * `#ifndef` * `#if` * `#endif` * `#else` * `#elif` ] --- class: ### Let's look at the error again: ```console $ gcc -IHDF5_gcc/include hdf5_example_1.c /tmp/ccEPK9Kf.o: In function `main': hdf5_example_1.c:(.text+0x5f): undefined reference to `H5check_version' hdf5_example_1.c:(.text+0x64): undefined reference to `H5open' [...] collect2: error: ld returned 1 exit status ``` -- #### This is not a compiler error! -- #### This is an error from the Linker `ld`! --- class: ### Intermezzo 2: ## The Linker: `ld` > ld combines a number of object and archive files, relocates their data > and ties up symbol references. Usually the last step in compiling a > program is to run ld. .footnote[from gnu linker ld Documentation (GNU Binutils)
] -- Can we use our compiler to just generate the "object file" and stop? --- class: ### Compiling just the Object: `gcc` has an option `-c` "Compile and assemble, but do not link" Let's try that: -- ```console $ ls hdf5_example_1.c HDF5_gcc HDF5_intel ``` -- ```console $ gcc -c -I HDF5_gcc/include/ hdf5_example_1.c ``` -- ```console $ ls hdf5_example_1.c hdf5_example_1.o HDF5_gcc HDF5_intel ``` --- class: ### Linking by hand ```console $ ld -o hdf_example.exe -emain \ hdf5_example_1.o \ HDF5_gcc/lib/libhdf5.so \ HDF5_gcc/lib/libhdf5_hl.so $ ls *.exe hdf_example.exe ``` -- or with gcc: ```console $ gcc -o hdf_example.exe \ hdf5_example_1.o \ HDF5_gcc/lib/libhdf5.so \ HDF5_gcc/lib/libhdf5_hl.so ``` -- We can refer to the external libraries using a different notation: ```console $ gcc -o hdf_example.exe \ hdf5_example_1.o \ -l hdf5 -l hdf5_hl \ -L HDF5_gcc/lib/ ``` --- class: ### Let's run the program: ```console $ ./hdf_example.exe ./hdf_example.exe: error while loading shared libraries: libhdf5.so.101: cannot open shared object file: No such file or directory ``` -- #### Intermezzo 3: ### ldd - print shared object dependencies ```console $ ldd hdf_example.exe linux-vdso.so.1 (0x00002b2108ffe000) libhdf5.so.101 => not found libhdf5_hl.so.100 => not found libc.so.6 => /cvmfs/.../lib/libc.so.6 (0x00002b2109200000) /cvmfs/.../lib/ld-linux-x86-64.so.2 (0x00002b2108fdb000) ``` --- class: ### Option 1: `LD_LIBRARY_PATH` ```console $ echo $LD_LIBRARY_PATH /opt/software/slurm/lib # export LD_LIBRARY_PATH="./HDF5_gcc/lib:$LD_LIBRARY_PATH" # better: $ export LD_LIBRARY_PATH="$PWD/HDF5_gcc/lib:$LD_LIBRARY_PATH" $ ./hdf_example.exe Writing data... Reading data... 1 2 3 4 5 6 ``` -- > In Linux, the environment variable LD_LIBRARY_PATH is a colon-separated > set of directories where libraries should be searched for first, before > the standard set of directories[...]. -- .alert-box.warning[ Don't just overwrite your `LD_LIBRARY_PATH` as other programs might rely on the libraries in those directories. Instead expand it by keeping the original PATH at the end. ] --- class: ### Option 2: rpath (RUNPATH) > **rpath** (RUNPATH) designates the run-time search path hard-coded in an > executable file or library. > Dynamic linking loaders use the rpath to find required libraries. ```console $ export LD_LIBRARY_PATH="/opt/software/slurm/lib" $ gcc -o hdf_example.exe \ hdf5_example_1.o \ -l hdf5 -l hdf5_hl \ -L HDF5_gcc/lib/ \ -Wl,-rpath,$(pwd -P)/HDF5_gcc/lib $ ./hdf_example.exe Writing data... Reading data... 1 2 3 4 5 6 ``` -- The *rpath* is added to the executable: ```console $ objdump -a -x hdf_example.exe | grep RUNPATH RUNPATH /project/6002406/tutorial/hdf5/HDF5_gcc/lib:$ORIGIN/../lib:↴ ↳$ORIGIN/../lib64:/cvmfs/.../var/nix/profiles/gcc-5.4.0/lib64 ``` --- ### Let's look again with `ldd`: ```console $ ldd hdf_example.exe linux-vdso.so.1 (0x00002add168d8000) libhdf5.so => /project/.../HDF5_gcc/lib/libhdf5.so (0x00002add16ada000) libhdf5_hl.so => /project/.../HDF5_gcc/lib/libhdf5_hl.so (0x00002add1702e000) libc.so.6 => /cvmfs/.../lib/libc.so.6 (0x00002add17250000) libz.so.1 => /cvmfs/.../lib/libz.so.1 (0x00002add175ee000) libdl.so.2 => /cvmfs/.../lib/libdl.so.2 (0x00002add17804000) libm.so.6 => /cvmfs/.../lib/libm.so.6 (0x00002add17a08000) libpthread.so.0 => /cvmfs/.../lib/libpthread.so.0 (0x00002add17d0d000) /cvmfs/.../lib/ld-linux-x86-64.so.2 (0x00002add168b5000) ``` Now all libraries are found. --- class: split-40 ## Putting it all together .column[ ```console $ gcc hdf5_example_1.c \ -D RANK=2 \ -I HDF5_gcc/include/ \ -l hdf5 -l hdf5_hl \ -L HDF5_gcc/lib/ \ -o hdf_example.exe \ -Wl,-rpath,$(pwd)/HDF5_gcc/lib ``` ] -- .column[ ---- ```plain compiler: the source file preprocessor: instead of #define in source preprocessor: location of header-files linker: which libraries to link linker: where to find libraries linker: name of the output file linker: set rpath ``` ] --
Let's look at: **`-Wl,-rpath,$(pwd)/HDF5_gcc/lib`**: * `-Wl` ⇒ pass the following option to the linker. * `-rpath` ⇒ add to the rpath... * `$(pwd)/HDF5_gcc/lib` ⇒ ...this directory --- class: ### But what are the benefits of separating Compilation from Linking? * Imagine finding something in a source-file that has 500'000 lines of code. -- * → split it into ~500 files with ~1000 lines each. -- * Now imagine compliling the whole thing takes 5 minutes. -- * You change a single file. -- * Do you really have to re-compile everything? -- * I.e. wait 5 min to be able to test? -- * → Just re-compile the files that have changed... -- * → ...and link them to a new binary. -- * You can write a `Makefile` to do just that. -- #### This will be the topic for another workshop. --- # And now with Intel Compiler ```console $ module purge $ module list Currently Loaded Modules: 1) nixpkgs/16.09 (S) 4) ifort/.2016.4.258 (H) 7) openmpi/2.1.1 (m) 2) icc/.2016.4.258 (H) 5) intel/2016.4 (t) 8) StdEnv/2016.4 (S) 3) gcccore/.5.4.0 (H) 6) imkl/11.3.4.258 (math) ``` -- ```console $ icc hdf5_example_1.c -DRANK=2 -IHDF5_intel/include/ \ -lhdf5 -lhdf5_hl -LHDF5_intel/lib/ \ -ohdf_example.exe -Wl,-rpath,$(pwd)/HDF5_intel/lib $ ./hdf_example.exe Writing data... Reading data... 1 2 3 4 5 6 ``` -- This is what we used for `gcc`: ```console $ gcc hdf5_example_1.c -DRANK=2 -IHDF5_gcc/include/ \ -lhdf5 -lhdf5_hl -LHDF5_gcc/lib/ \ -ohdf_example.exe -Wl,-rpath,$(pwd)/HDF5_gcc/lib ``` --- class: ## Summary * Compiling a program has different stages: * Preprocessing ⇒ Compiling ⇒ Linking * We can pass options to the Preprocessor: * `-D` definitions * `-I` location of `include` files * can be passed via environment variable `CPATH` * This works with all kind of textfiles (e.g. Gromacs topologies) * We can pass options to the Linker: * `-l` which libraries to link * `-L` where those libraries can be found * can be passed via environment variable `LIBRARY_PATH` * Resolve shared object dependencies during runtime: * use `ldd` to list shared object dependencies * either set `LD_LIBRARY_PATH` (good for debugging) * better set *rpath* in the executable or library * Works the same with most C, C++, Fortran compilers --- class: ## Summary (2) .center[
### This presentation wasn't at all about compilers. ]