# Building Software and Pipelines

For building (compiling) software that was written in a compiled language, most of the times one uses a so called build system.

For software written in C, C++ or Fortran, a tool called **make** is normally used.

## Building Software
### Example 
(taken from "A Simple Makefile Tutorial" <http://www.cs.colby.edu/maxwell/courses/tutorials/maketutor/> )

A small C-program could consist of these three files: 
* **hellofunc.c**:

```c
#include <stdio.h>
#include <hellomake.h>
void myPrintHelloMake(void) {
  printf("Hello makefiles!\n");
  return;
}
```

* **hellomake.c**:

```c
#include <hellomake.h>
int main() {
  // call a function in another file
  myPrintHelloMake();
  return(0);
}
```

* **hellomake.h**:

```c
/*
example include file
*/
void myPrintHelloMake(void);
```

This could then be built with the command:

```shell
$ gcc -o hellomake hellomake.c hellofunc.c -I.
```

This will build the compiled executable **hellomake** (`-o hellomake`) from the source files `hellomake.c` and `hellofunc.c` while looking for futher include-files (also called header files) in the current directory (`-I.`).

While this can work for smaller software projects, the command for compiling a progam consisting of dozends of file will become very long and complicated and will cause all files to be re-compiled every time, even if only a single file has changed.


## Enter Make

The developer usually creates a Makefile, which decribes the components and steps of the build process.  When running **make**, it reads the Makefile and builds the software based on the *targets* defined in there.

A Makefile for above example might look like this:

#### Makefile 1
```Makefile
hellomake: hellomake.c hellofunc.c
	gcc -o hellomake hellomake.c hellofunc.c -I.

```
**Important:** The indentation in Makefiles has to use tab characters (not spaces)!!!

Now one can build `hellomake` with one command:

```shell
$ make -f Makefile1
make: 'hellomake' is up to date.
```

1. If the makefile would be called just `Makefile` (not `Makefile1`), one would only need to type `make`.
2. Make notices that hellomake has already been compiled and is up to date and ends up doing nothing.


### Only compile the files that have changed

In `Makefile1` the first line defines `hellomake` as a **target** for which `hellomake.c` and `hellofunc.c` are dependencies. If the target does not already exist or at least one of the dependencies has a newer timestamp than the target, make will run the indented block of commands to create (build) the target.

We can split the build process in pieces, creating a compiled object file from each of the .c files and linking these to the final `hellomake` executable.

#### Makefile2
```Makefile
hellomake: hellomake.o hellofunc.o 
	gcc -o hellomake hellomake.o hellofunc.o -I.
    
hellomake.o: hellomake.c
	gcc -c -o hellomake.o hellomake.c -I.

hellofunc.o: hellofunc.c
	gcc -c -o hellofunc.o hellofunc.c -I.

```

The -c option of the C-compiler makes it build only the intermediary object files.

In addition to that we can introduce variables for our C-compiler and compiler-flags:

#### Makefile2b
```makefile
CC=gcc
CFLAGS=-I.

hellomake: hellomake.o hellofunc.o 
	$(CC) -o hellomake hellomake.o hellofunc.o $(CFLAGS)

hellomake.o: hellomake.c
	$(CC) -c -o hellomake.o hellomake.c $(CFLAGS)

hellofunc.o: hellofunc.c
	$(CC) -c -o hellofunc.o hellofunc.c $(CFLAGS)
```



We can avoid writing (and maintaining) a new target for every single object (.o) file that we want to create from a .c file by defining a general macro:

#### Makefile3
```makefile
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h

hellomake: hellomake.o hellofunc.o 
	$(CC) -o hellomake hellomake.o hellofunc.o $(CFLAGS)

%.o: %.c $(DEPS)
	$(CC) -c -o  $@  $<  $(CFLAGS)
```

* The line **`%.o: %.c $(DEPS)`** says: Any target that ends in **`.o`** depends on a file with the same base name and ending in **`.c`** in addition to what is listed in the variable called **`DEPS`**
* In the compiler command the **`$@`** macro is replaced with the full name of target (before the `:`) and
* the **`$<`** macro is replaced with the first item of the dependency list (after the `:`)



Following the **DRY** Principle (**D**on't **R**epeat **Y**ourself), we can simplify a bit more, by defining the list of objects that hellomake depends on in one place and using the **`$^`** macro, that is replaced by the full list of dependencies of a target:

#### Makefile4
```Makefile
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h
OBJ= hellomake.o hellofunc.o

hellomake: $(OBJ) 
	$(CC) -o  $@  $^  $(CFLAGS)

%.o: %.c $(DEPS)
	$(CC) -c -o  $@  $<  $(CFLAGS)
```


Finally we add a "phony" target called "clean" that deletes all objects and the execuable:

#### Makefile
```Makefile
CC=gcc
CFLAGS=-I.
DEPS = hellomake.h
OBJ= hellomake.o hellofunc.o

hellomake: $(OBJ) 
	$(CC) -o  $@  $^  $(CFLAGS)

%.o: %.c $(DEPS)
	$(CC) -c -o  $@  $<  $(CFLAGS)

.PHONY: clean

clean:
	rm $(OBJ)
	rm hellomake
```

The .PHONY rule tells make that it should not expect a file named `clean`.

```shell
$ make clean
rm hellomake.o hellofunc.o
rm hellomake

$ make
gcc -c -o  hellomake.o  hellomake.c  -I.
gcc -c -o  hellofunc.o  hellofunc.c  -I.
gcc -o  hellomake  hellomake.o hellofunc.o  -I.

```

## How can Make be used to automate an analysis workflow?

Imagine you have:

1. several files of raw data,
2. a script `process_data.py` processes the raw data and writes the processed data into a differnent file,
3. a script `plot_data.py` that generates a plot/figure/image from the processed data,
4. one or more LaTeX files and bibliography files for a thesis, report, manuscript, etc.
5. And you want to quickly re-generate your report any time you get new data.



```python
Generate some data:
import numpy as np
x = np.arange(0, 2*np.pi, np.pi/100)
y1 = np.sin(x)
y2 = np.cos(x)
y3 = np.tan(x)
np.savetxt("data/data1.txt", y1, delimiter=',' )
np.savetxt("data/data2.txt", y2, delimiter=',' )
np.savetxt("data/data3.txt", y3, delimiter=',' )
```

In [17]:
cd make_report

[Errno 2] No such file or directory: 'make_report'
/home/ostueker/Carpentry/CMSC6950-2017/lectures


In [6]:
%ls

[0m[01;34mdata[0m/  [01;35mfigure1.svg[0m  plot_data.py  process_data.py  [01;34mtempdir[0m/
data1.txt  data2.txt  data3.txt


In [7]:
%ls data

data1.txt  data2.txt  data3.txt


In [15]:
%run process_data.py --help

usage: process_data.py [-h] [-o OUTFILE.CSV] FILE.TXT [FILE.TXT ...]

Process some data files.

positional arguments:
  FILE.TXT        name of data file

optional arguments:
  -h, --help      show this help message and exit
  -o OUTFILE.CSV  name of the output file


In [16]:
%run plot_data.py --help

usage: plot_data.py [-h] [-i INFILE.CSV] [-o PLOT.SVG]

Plot a datafile.

optional arguments:
  -h, --help     show this help message and exit
  -i INFILE.CSV  name of the data file
  -o PLOT.SVG    name of the output file


#### Content of `Makefile`:

```Makefile
# Makefile to process datafiles, generate a plot and build LaTeX report.

# Variable with list of files with raw data:
DATA=data/data1.txt data/data2.txt data/data3.txt

report.pdf:  report.tex  figure1.svg
	latexmk -pdf -pdflatex='pdflatex -shell-escape'

figure1.svg:  plot_data.py  tempdir/processed_data.csv
	python plot_data.py  -i tempdir/processed_data.csv  -o figure1.svg

tempdir/processed_data.csv:  process_data.py  $(DATA)
	python process_data.py  $(DATA)  -o tempdir/processed_data.csv


.PHONY:  clean  almost_clean

clean:  almost_clean
	rm report.pdf
	rm figure1.svg

almost_clean:
	latexmk -c
	rm tempdir/processed_data.csv
```


#### Content of file report.tex:

```latex
\documentclass[10pt,letterpaper]{article}
\usepackage{fullpage}
\usepackage{svg}
\usepackage{minted}
\usepackage{float}

\begin{document}
\title{Building a Workflow to create Reports}
\author{Oliver Stueker}
\date{\today}
\maketitle

\begin{abstract}
This document is created by a make script.
\end{abstract}

\section{Introduction}
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam.

\begin{figure}[!ht]
	\centering
	\includesvg[width=0.5\columnwidth]{figure1}
	\caption{The plotted example data.}
\end{figure}

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam.


\end{document}
```


#### Content of file `process_data.py`:

```python
#!/bin/env/python3
import argparse
import os
import numpy as np
import pandas as pd

# use argparse package for processing command line arguments
parser = argparse.ArgumentParser(description='Process some data files.')
parser.add_argument('infiles', metavar='FILE.TXT', type=str, nargs='+',
                    help='name of data file')
parser.add_argument('-o', dest='outfile', metavar='OUTFILE.CSV',
                    default="tempdir/processed_data.csv",
                    help='name of the output file')
args = parser.parse_args()

# load all inputfiles and store arrays in dict
data = {}
for arg in args.infiles:
    dat = np.loadtxt(arg)
    name = arg.split('/')[-1].split('.')[0]
    data[name]=dat

# create dataframe from dict
df = pd.DataFrame(data)

# do some 'fancy' processing ;-)
df['prod'] = df['data1'] * df['data2'] + df['data3']

# create temp dir
if not os.path.exists('tempdir'):
    os.mkdir('tempdir')

# export processed data
df.to_csv(args.outfile, index=False)
```


#### Content of file `plot_data.py`:

```python
#!/bin/env/python3
import argparse
import os
import pandas as pd

# use argparse package for processing command line arguments
parser = argparse.ArgumentParser(description='Plot a datafile.')
parser.add_argument('-i', dest='infile', metavar='INFILE.CSV', 
                    default="tempdir/processed_data.csv",
                    help='name of the data file')
parser.add_argument('-o', dest='outfile', metavar='PLOT.SVG',
                    default="figure.svg",
                    help='name of the output file')
args = parser.parse_args()

df = pd.read_csv(args.infile)

plot = df.plot(ylim=(-5,5))
plot.figure.savefig(args.outfile)

```