diff --git a/02cpp1/sec03ObjectOrientedProgramming.md b/02cpp1/sec03ObjectOrientedProgramming.md index 11c950a00..895fc8498 100644 --- a/02cpp1/sec03ObjectOrientedProgramming.md +++ b/02cpp1/sec03ObjectOrientedProgramming.md @@ -152,9 +152,9 @@ int main() - `getCount()` returns an integer _by value_, so it returns a copy of `count`. We can't modify `count` through this function or the value we get back from it. - `count` is now private, so if we try to access this directly from outside the class the compiler will raise an error. -## Using Objects for Data Integrity +## Class Invariants and Using OOP for Data Integrity -An extremely useful aspect of defining a new type via a class is the ability to provide guarantees that any object of that type satisfies certain properties. These properties allow programmers to write programs that are more efficient and correct with less overhead for error checking. +An extremely useful aspect of defining a new type via a class is the ability to provide guarantees that any object of that type satisfies certain properties; such properties are often referred to as class _invariants_. These properties allow programmers to write programs that are more efficient and correct with less overhead for error checking. Let's explore this with some examples. @@ -182,7 +182,7 @@ class Ball }; ``` -and then we can call the density directly without another calculation. The problem that we now have is that in order for our data to be self-consistent, **a relationship between the radius, mass, and density must be satisfied**. +and then we can call the density directly without another calculation. The problem that we now have is that in order for our data to be self-consistent, **a relationship between the radius, mass, and density must be satisfied**. This kind of relationship is called an **invariant**: a property that must be maintained by all instances of a class. Invariants are very important for writing safe programs, and for being able to reason about the behaviour of programs. We could approach this problem by calculating the density in the constructor, and making the radius, mass, and density **private**. This means that external code can't change any of these values, and therefore they can't become inconsistent with one another. But we still need to be able to _read_ these variables for our physics simulation, so we'll need to write **getter** functions for them: ```cpp @@ -268,7 +268,34 @@ This is particularly bad because we'd expect people to look up books far more of If our list were _sorted_, then we can search much more quickly using a [binary search](https://en.wikipedia.org/wiki/Binary_search_algorithm). A binary search on a sorted list starts by looking at the element in the middle of the list and checks if the item we're looking for should come before or after that. We then only need to search the half of the list that would contain the book we're looking for. We then apply the same thing again to narrow the list down by half again, and so on. At every step we half the size of the list and therefore the number of titles we have to check is proportional to _the logarithm of the size of the list_. This is much, much better performance, especially if the size of the list is large. A binary search with 21 comparisons could search a list of over a million books! -Of course, we don't want to sort our data before searching it every time (that would be even more wasteful than our linear search), and we want to know with certainty that our list is always sorted, otherwise our binary search could fail. Using an object is a solution: we can define a wrapper class which keeps the list private, and provides an insertion method which guarantees that new entries are inserted into their proper place. Then **we can take advantage of speedier lookup because we know that our catalogue is always in sorted order**. (Incidentally, this would normally be done with a _balanced binary search tree_, an example of which is the C++ `map` type.) +Of course, we don't want to sort our data before searching it every time (that would be even more wasteful than our linear search), and we want to know with certainty that our list is always sorted, otherwise our binary search could fail. Using an object is a solution: we can define a wrapper class which keeps the list private, and provides an insertion method which guarantees that new entries are inserted into their proper place. Then **we can take advantage of speedier lookup because we know that our catalogue is always in sorted order**. In this case our _invariant_ is the property of being sorted, or put more explicitly $i < j \implies x_i \le x_j$. (Incidentally, this would normally be done with a _balanced binary search tree_, an example of which is the C++ `map` type.) + +### Reasoning About Class Invariants + +From these examples we can see an important pattern arise: an object will maintain the desired property if it is constructed in a state which has that property, and if all permissible operations on the object maintain that property. This is a form of _inductive reasoning_, where the initial construction of the object serves as a base case, and all other possible states of the object are found by the operations on that object (calling member functions or manipulating public data). To design a class where any object of that class maintain a property $P$ then you should: + +- Write you constructor so that $P$ is guaranteed for any constructed object. Be wary of uninitialised variable within your class. +- Make any variables `private` if a modification of that variable can alter the property $P$. For example, to maintain a list as being sorted we made the underlying `vector` private because any modification of the data in the array could violate the sorting property. To protect our `Ball` class we made the `mass`, `radius`, and `density` private since modifying any one of these could violate the physical relationship between these parameters. +- Make sure that getters don't return private member variables by reference or through pointers unless there are appropriate `const` protections on the data, as this would otherwise allow unguarded modifications to the state. +- Ensure that any functions that modify the state of the class do not violate the property. In the case of our sorted list, this means that the insertion must update the list in a way that it remains sorted. Be sure to check any setters, as with the `Ball` class: modifying any one of the properties of the ball has consequences for the others. It's a good idea to mark any functions that should not modify the state as `const` so that the compiler can spot any potential risks (see below). + +### Protecting State with `const` Members + +A member function that is declared `const` cannot modify the state of any member variables in that object. This is a very useful guarantee when reasoning about objects of a given class, and the compiler can help us enforce this. + +Consider the getter functions from our `Ball` class, or if we wanted to add a function to print the ball's state. Functions like these should never change the state of the ball itself, so we can mark them as const. We can see an example of this in the following code (the rest of the class definition is omitted for brevity). + +```cpp +class Ball +{ + public: + double getRadius() const {return radius;} + double getMass() const {return mass;} + double getDensity() const {return density;} +}; +``` +- The `const` keyword comes after the function name and arguments but before the code block. +- A `const` member function cannot call any other member functions which are not `const`. ## Aside: Organising Class Code in Headers and Source Files @@ -277,7 +304,8 @@ As we saw last week, C++ code benefits from a separation of function declaration In the header file, we should declare the class as well as: 1. What all of its member variables are 2. Function declarations for all of its member functions -3. Can also include full definitions for trivial functions such as getter/setter functions +3. Can also include full definitions for trivial functions such as getter/setter functions if marked `inline`. This can help the compiler optimise these function calls. +4. If a member is marked `const` the keyword needs to appear in both the declaration (header file) and implementation (source file). For example: **In `ball.h`:** @@ -288,9 +316,9 @@ class Ball Ball(std::array p, double r, double m); std::array position; - double getRadius(){return radius;} - double getMass(){return mass;} - double getDensity(){return density;} + inline double getRadius() const {return radius;} + inline double getMass() const {return mass;} + inline double getDensity() const {return density;} private: void setDensity(); diff --git a/04cpp3/sec03Templates.md b/04cpp3/sec03Templates.md index 5aea6546b..9ec883d3a 100644 --- a/04cpp3/sec03Templates.md +++ b/04cpp3/sec03Templates.md @@ -95,6 +95,43 @@ vector everyOther(vector &v_in) - The exact details of the type `T` don't matter in this case, since we never access the data of type `T` anyway. The only restriction on `T` is that it can be added to a vector. - A function can be generated for every kind of vector in this way. +### Using `auto` with Function Templates + +The keyword `auto` can be very useful in function templates. It can be used for variable declarations as well as the output type of a function template where the types will not be known until template substitution takes place. This is particularly useful when you are functions that template over callable objects. A trivial example would be: + +```cpp +template +auto apply(F f, In x) +{ + auto y = f(x); + return y; +} +``` + +Here the template parameter `F` can stand in for something callable: a function, `std::function`, lambda-expression, or callable object. From this code we cannot immediately tell what type the result of applying the function to `x` will be, and therefore what the return type of this function is. Using `auto` allows us to write code like this while leaving it to the compiler to infer this type as well as it is able. + + +In principle we could try to explicitly template over this additional unknown type, as in the code example below. + +```cpp +template +Out apply2(F f, In x) +{ + Out y = f(x); + return y; +} +``` + +In this case the compiler will fail to infer the type of `Out`. The reason for this is that the compiler needs to be able to deduce all the template parameters _at the call site_. Now that we've introduced an additional parameter `Out`, we would be calling a function like this: + +```cpp +int z = apply2(f, x); +``` + +which would allow the compiler to deduce the types `F` and `In`, but gives no information about `Out` since it does not appear in the arguments. In order to make this version work, we need would need to supply the template parameter explicitly using `<>`. + +By contrast, when we use `auto` the compiler can determine at the call site the types of `F` and `In` from the arguments supplied, and can then immediately find or generate the appropriate `apply` function. The type of `y` and the output of the function are then determined with full information about the function rather than just with the limited information at the call site. + ## Using Templates with Overloaded Functions One very useful way to make use of templates is to exploit operator / function overloading. Operators or functions which are "overloaded" can operate on multiple types, for example: diff --git a/04cpp3/sec04VariadicTemplates.md b/04cpp3/sec04VariadicTemplates.md new file mode 100644 index 000000000..570822ace --- /dev/null +++ b/04cpp3/sec04VariadicTemplates.md @@ -0,0 +1,255 @@ +--- +title: Variadic Templates +--- + +# Variadic Templates and Fold Expressions + +Up to now, we have only worked with templates that take a fixed number of parameters. C++17 introduces two features that make it possible to write functions which work with _any number_ of arguments: variadic template parameter packs and fold expressions. + +As with all template code, the compiler must see the full template definition wherever it is instantiated, so **functions using variadic templates and fold expressions are normally placed in header files**. + +A consideration that applies to all variadic functions (and template functions in general) is the potential for "code bloat" in the machine code, that is for compilation to result in large binaries. This is because the compiler will generate a new function for every new matching signature: that means any variation in the number or types of arguments. Code bloat is not always a major problem if you have the available memory, but it is an important consideration for more restricted circumstances like small devices. + +## Parameter Packs + +A _parameter pack_ allows a function template to take an arbitrary number of arguments of the same logical category. For example, we might want a function that computes an operation over several numbers. We will look at two examples which use common patterns for writing variadic template functions: +```cpp +template +T variadic_sum(Ts... xs); + +template +T quadrature(T x, Ts... xs); +``` +You may wonder why the latter template is written with both a single type `typename T` and a parameter pack `typename... Ts` in the arguments, rather than putting everything into one pack. The structure is intentional and it helps in two important ways: +- By separating out the first argument and using it as the return type it is clearer what type will be returned by a call and what implicit conversions may take place if the function is called with a mix of types (e.g. `double`, `float`, `int`), rather than leaving it up to the compiler (which may also fail to infer the output type if there is not sufficient information). Every subsequent argument in the paramter pack must be convertible to the same type `T`. +- In general, **a parameter pack may be empty**. For some functions like finding the maximum of a list however the operation only makes sense if there is at least one value to operate on. By writing the function as: + ```cpp + T quadrature(T x, Ts... xs) + ``` + the first value `x` must always be present. If we instead wrote: + ```cpp + template + auto quadrature (Ts... xs); + ``` + then the coompiler would happily accept call like `quadrature` (zero arguments), which is not well defined. Splitting off the first argument avoids this situation without needing additional error-handling constructs. + +### Unpacking Parameter Packs + +There are a variety of ways of extracting information from the parameter packs. One very important piece of information is the _number of arguments_ in the pack. We can extract this using the `sizeof...()` operator. (Note that the elipsis `...` is part of the function name in this case!) + +```cpp +template +T variadic_sum(T x, Ts... args) +{ + const int N = sizeof...(args); + +} +``` + +This function is of course incomplete! One thing that we can do is convert a pack to an array of variables: + +```cpp +template +T variadic_sum(Ts... args) +{ + const int N = sizeof...(args); + T ys[N] = {args...}; // get array of arguments + + T sum = 0; + for(int i = 0; i < N; i++) + { + sum += ys[i]; + } + + return sum; +} +``` + +This gives us our array of arguments, but note that we have now imposed a restriction: since `ys` is an array of `T`, all of the arguments must be implicitly converted to type `T` here. + +We can then call the function as follows: +```cpp +int main() +{ + double s = variadic_sum(1.0, 2.0, 4, 5.1); + printf("Sum = %f\n", s); + + return 0; +} +``` +Note that we have had to specify the first type (`T`) as the compiler is unable to infer it. You can specify as many of the parameters as you need to, **in order**. For example, the following are all valid function calls: + +```cpp + double s1 = variadic_sum(1.0, 2.0, 4, 5.1); + double s2 = variadic_sum(1.0, 2.0, 4, 5.1); + double s2 = variadic_sum(1.0, 2.0, 4, 5.1); +``` +In the definition of `s2` the `4` is implicitly converted into a `double` before being passed into the function. since that function takes the first four arguments as `double`. In the definition of `s1` it is passed as an `int` and converted when we define `ys`, since this is an array of `T`, which is given type `double`. In the definition of `s3` the `4` is again passed as an `int` and is converted when `ys` is defined. + +We can also write a range based loop to avoid potential sizing errors: + +```cpp +template +T variadic_sum(Ts... args) +{ + T sum = 0; + for(const auto& y : {args...}) + { + sum += y; + } + + return sum; +} +``` + +A special consideration is needed here because we have not explicitly converted the `args...` to an array of a specific type. The compiler will try to do this to `{args...}` automatically, but if not all of our arguments are the same type then will be a compilation error. As such, our mixed `double` and `int` arguments will fail here, and we would need to make sure all arguments are explicitly doubles like so: + +```cpp +int main() +{ + double s1 = variadic_sum(1.0, 2.0, 4, 5.1); + double s2 = variadic_sum(1.0, 2.0, 4.0, 5.1); // 4.0 is interpreted as double +} +``` + +## Fold Expressions + +A more concise way of writing functions over parameter packs can often be found by using C++17 fold expressions. A fold is a kind of _reduction_ expression. A fold takes a list of elements $(x_0, x_1, ..., x_{N-1})$, a binary operator $\oplus$, and a special element $i$ and applies a binary operator $\oplus$ to each of the elements. For an associative operator, a reduction looks like this: + +$i \oplus x_0 \oplus x_1 \oplus ... \oplus x_{N-1}$. + +However, if the operator is non-associative, that is if $x \oplus (y \oplus z) \neq (x \oplus y) \oplus z$, then we can distinguish between right folds + +$(x_0 \oplus (x_1 \oplus (... \oplus(x_{N-1} \oplus i) ...)))$, + +and left folds + +$(...((i \oplus x_0) \oplus x_1) \oplus x_2) \oplus ... ) \oplus x_{N-1}$, + +by their order of operations and by whether the special element $i$ appears is applied to the first or last element. ($i$ is usually the _identity_ element of $\oplus$, so it usually does not matter whether this element goes, and the concern between right and left folds is generally the order of operations for non-associative operators.) + +Fold expressions allow us to apply an operator across all elements of a parameter pack in a single, compact expression. This is particularly useful for numerical operations like sums and products. + +Let us examine the most simple case of our `variadic_sum` again: + +```cpp +template +T variadic_sum(Ts... args) +{ + T sum = (0 + ... + args); // left fold expression + + return sum; +} +``` +The fold expression `(0 + ... + args)` automatically expands depending on how many arguments are provided. This avoids writing separate overlaods for 2, 3 or more inputs, and significantly reduces boilerplate. + +Note that we have started fold with `0`, which is the _identity_ element of the addition operator ($x + 0 = x$ for all $x$). This is very common in reduction expressions, for example in products you will often start with `1` which is the multiplicative identity. You can, of course, start with any element you like, for example `T sum = (5 + ... + args);` would give $5 + \Sigma_i y_i$. + +We can also write this as a right fold expression: + +```cpp +template +T variadic_sum(Ts... args) +{ + T sum = (args + ... + 0); // right fold expression + + return sum; +} +``` + +Mathematical addition is associative, but remember that on a computer _floating point addition is not perfectly associative due to rounding errors_, and therefore your results may vary depending on your order of summation (i.e. left vs right fold). + +There is a [list of operators available for use in fold expressions](https://en.cppreference.com/w/cpp/language/fold.html), however we can also make our expressions can be more complex than a simple reductions; as an example in scientific computing it is common to combine independent uncertainties ($\sigma_i$) by adding them in quadrature: +$$ +\sigma_\mathrm{tot} = \sqrt{\sigma_0^2 + \sigma_1^2 + \ldots + \sigma_{N-1}^2}. +$$ +A variadic template with a fold expression provides a natural, compact implementation: +```cpp +#include + +template +constexpr T quadrature(T x, Ts... xs) { + const T sumsq = (x*x + ... + (xs*xs)); // left fold expression + return std::sqrt(sumsq); +} +``` +Note that by separating off the first argument we are forcing this function to take at least one argument, and can therefore also dispense with any speical element $i$ like the identity. + +Here the binary operator is again addition but we are squaring the elements before they are added. You can use folds to calculate expressions of the form + +$(f(x_0) \oplus (f(x_1) \oplus (... \oplus(f(x_{N-1} \oplus i)) ...)))$, + +or + +$(...(((i \oplus f(x_0)) \oplus f(x_1)) \oplus f(x_2)) \oplus ... ) \oplus f(x_{N-1})$ + +for some function $f$, or more generally you can replace $f(x)$ with an _expression_ which depends on $x$. For example `(std::cout << x)` is an expression but not a function; the following variadic function prints out whatever arguments are supplied. + +```cpp +void print_args(Ts... args) +{ + ((std::cout << args << " "), ..., (std::cout << std::endl)); +} +``` + +In this case ["`,`" is the operator](https://en.cppreference.com/w/cpp/language/operator_other.html#Built-in_comma_operator), which just evaluates expressions separated by a comma left-to-right, and `std::cout << args << " "` is the expression that is applied to each variable, and `std::cout << std::endl` is the special element (which is also an expression because `,` operates on expressions). + +### "Unary" Fold Expressions + +We can also write left and right fold expressions as a so-called "unary fold" expression. This does _not_ mean that the operator is a unary operator: $\oplus$ is always a binary operator! Instead it means that we don't supply the special element $i$, like this: + +```cpp +template +T unary_sum(Ts... args) +{ + T sum = (args + ...); // Unary right fold + + return sum; +} +``` +The _unary_ in this case means that we don't have anything on the other side of the `+ ...`. A unary left fold is defined similarly: + +```cpp +template +T unary_sum(Ts... args) +{ + T sum = (... + args); // Unary right fold + + return sum; +} +``` + +Since there is no special element $i$ in the expression, **this function is undefined for an empty parameter pack**, and therefore will not compile if called with no arguments. This is another way of enforcing a non-empty argument list without needing to separate out the first argument. + +### Advanced Reductions + +C++ supports a limited number of binary operators for its fold syntax. What if we want to use something outside of this list? In principle we can use an arbitrary function $f: A \times A \rightarrow A$ for a left or right fold. These however have to be implemented manually, which we can do with a _recursion_. + +```cpp +template +constexpr T foldf(Op f, T x1, T x2, Ts... xs) +{ + T y = f(x1, x2); + if constexpr (sizeof...(Ts) == 0) + { + return y; + } + else + { + return foldf(f, y, xs...); + } +} +``` + +- Since f is a binary function we have stipulated at least two arguments so that we can evaluate the function. +- Note the function call `foldf(f, y, xs...)`. In this call `y` will become the new `x1` and the first element of `xs` (i.e. the next element of the list) becomes `x2`. We know that there is at least one element in `xs` because we checked in the `if` statement. +- Note the use of `constexpr` in the `if` statement. The purpose of this is to allow the `if` statement to be evaluated at _compile time_ instead of at runtime; in this way the compiler will generate different code depending on whether there are elements of `xs` left or not. If it couldn't do this at compile time we would have a problem when the size of `xs` reaches 0, since `foldf(f, y, xs...)` can't be compiled unless there is at least one argument in `xs` since `foldf` takes a minimum of three arguments. + - You can find out more about `constexpr` in the notes for week 7, where we will look at them in the context of compiler optimisation. + +This kind of recursive approach to variadic functions can give us access to much more expressive and powerful functions, but there is a risk of code bloat if you provide a large number of arguments. As an example let's see what happens when we call this function on a simple addition function for a sequence of numbers: + +```cpp +std::cout << foldf(f, 1, 2, 3, 4, 5, 6, 7, 8, 9) << std::endl; +``` + +If you look at the generated assembly code for the compiled program (you can use `objdump -d `) you will find that there are 8 different implementations for `foldf`, one for each number of arguments that has been passed! Since `foldf` is in this case recursive we have called `foldf` with 10 arguments, 9 arguments, and so on down to 3 arguments. diff --git a/07performance/DataStructures.md b/07performance/DataStructures.md new file mode 100644 index 000000000..4526f3ac9 --- /dev/null +++ b/07performance/DataStructures.md @@ -0,0 +1,86 @@ +--- +title: Computational Complexity +--- + +# Common Data-Structures + +There are some key data-structures with which you will need to be familiar. Each of these is a kind of container; we shall consider how these structures affect how we can access and insert data. We will discuss the complexity of data access and insertion for some of these in class. + +## Random Access Arrays + +A _random access array_ is the kind of array with which you are already familiar. Data is laid out sequentially and contiguously in memory, and so it is easy to calculate the memory location of any given element from the starting memory location and the index $i$ of the element. This is why they are called _random access_: we can access any element of the array in $O(1)$ time and no elements are harder to find than any others. Inserting an element into a random access array is often cumbersome due to the need to keep the data contiguous: if you want to insert data somewhere other than the end of the array then you need to shift all elements that appear afterwards in memory. This operation is $O(n)$. + +`std::vector`, `std::array`, and C-style arrays are all examples of random access arrays. + +### In detail: `std::vector` in C++ + +The most common form of random access array that we use in C++ is `std::vector`, so it's worth understanding a bit more about how it works. `std::vector` comes with operations to insert and remove elements, and it distinguishes between insert/removal in the middle of the vector and at the end. + +#### Time complexity of `push_back` and `pop_back` + +To add an element at the end of an array we can use `push_back`. To understand this operation's behaviour from a complexity point of view, we have to think about the way that `vector` allocates and manages memory. A `vector` will allocate memory on the heap for its elements, and it will often allocate more memory on the heap than it needs to store its elements. It has separate data members to keep track of the size of the vector (i.e. the number of elements), and the size of the allocation. If the allocation is larger than the current size of the vector, then `push_back(x)` can simply place `x` in the next address in memery and increment the size counter. If however the allocation is full, a new, larger, allocation will need to be made and the entire array of data copied over to the new allocation in order to have space to add another element. (The previous allocation is then freed.) This means that some `push_back` operations take much longer than others, and as the vector gets bigger the time for this copy keeps getting larger! Just how much bigger the allocation should be made each time can have a significant impact on how the structure performs: `std::vector` uses a strategy that guarantees that the _average_ (amortised) time for a `push_back` operation remains $O(1)$. (This is because although the time for a reallocating `push_back` keeps increasing as the array gets larger, the frequency of these operations goes down. For example, if you double the size of the allocation each time you reallocate, `push_back` will have amortised constant time.) + +Although `push_back` takes amortised constant time, some `push_back` operations will take longer than others, which may be a concern if you have latency restrictions. Because of the reallocations and the need to check the size of the existing allocation, repeated `push_back` operations carry some overhead compared to simply setting the values within a `vector`. As such, **when the size of the `vector` needed is known ahead of time, it is better to initialise a `vector` of the correct size and then set the elements in a loop rather than using `push_back`** inside a loop. Using `push_back` can be the most natural approach however when this is not possible, for example streaming data in from a file where the total number of elements is not known. + +There is a corresponded operation for removing an element from the end of the list, `pop_back`, which is always constant time since it cannot trigger a reallocation. + +#### Insertion and removal of arbitrary elements + +We also have `insert` and `erase` for inserting and removing elements. Removing elements will not require any reallocations, but it does require shifting any data to the right of the elements being deleted. Removing an element is therefore, on average, $O(n)$. Inserting an element works similarly, as it needs to shift any data to the right of the location being inserted into; an additional factor is that like `push_back` it may trigger a reallocation. Insertion is on average $O(n)$. + +## Linked Lists + +A _linked list_ is a representation of a list that is stored like a graph: each element of the list consists of its data and a pointer to the next element of the list. A linked list has no guarantees of being stored contiguously, so the only way to navigate the linked list is to follow the pointers from one node to the next; this is in contrast to random access arrays. A common extension of the linked list is the doubly-linked list, which has pointers to the next _and_ previous element in a list. + +The diagram shows a possible layout linked list nodes in memory. The red grid shows memory locations, blue cells are occupied by a linked list data element and the yellow cells are occupied by a linked list pointer; arrows show the location that each pointer stores. Note the list can be terminated with a null pointer. + +![image](img/LinkedListMemory.png) + +Accessing element $i$ of the list requires us to read and follow $i$ pointers, and the amount of work done to find elements increases linearly as the we get further into the list. The advantage of a linked list however is that we can add or remove elements more straightforwardly by simply modifying the relevant pointers. This is much simpler than removing or inserting elements in the middle of a random access array, which requires copying memory to keep all the elements correctly in order. + +Linked lists also provide natural representations for various scenarios: + +- Singly linked lists can have multiple lists share the same tail. +- Infinite cycles can be easily represented as linked lists without additional book-keeping. +- Linked lists are _recursive data structures_, which make some algorithms natural to express as simple recursive functions. + +`std::list` is usually implemented as a doubly-linked list, and `std::forward_list` is usually implemented as a singly-linked list. + +## Binary Search Trees + +A _binary search tree_ (BST) is another graph based structure, where each node consists of its data, and pointers to a left and right sub-tree (the "left child" and "right child"). The data stored in a BST must admit a comparison operator $<$, so that for a given node with data $d$: + +- for all data $d_L$ in the left sub-tree, $d_L < d$, and +- for all data $d_R$ in the right sub-tree, $d_R >= d$. + +A BST is therefore always _sorted_ w.r.t. keys. It is often used to implement _associative arrays_, which is a set of key-value pairs that allow look-up of values based on key. You may be familiar with this concept as a _dictionary_ in Python. If the data in a BST is a key-value pair $(k,v)$, then the ordering is just on the key $k$. Looking up a value based on a key requires traversing the tree from its root and comparing the keys to determine whether to look up the left or right sub-tree at each node. + +The diagram below shows a BST; the first piece of data (in the blue cell) is the _key_, followed by value, the left child pointer, and the right child pointer. A null pointer can be used to represent a lack of left or right child. Note that the tree is sorted by the _key_ and is not sorted on _values_. + +![image](img/BinarySearchTree.png) + +The complexity of many operations on BSTs is determined by the _height_ of the tree. The height of a tree (or subtree) is the maximum length of path that we can take from the top node until we reach a node with no children and can go no further. The diagram above shows a tree of height 3; the subtree starting at the node with $k=4$ has height 2. + +Like a linked list, a BST is not necessarily contiguous, and different nodes may be located anywhere in memory. In order to explore a BST for look-up or insertion we have to follow a chain of pointers to find the memory locations of the nodes. + +There are variations on BSTs, such as red-black trees, called _balanced_ BSTs. A balanced BST guarantees that the left and right sub-trees of any given node are similar in size. (More precisely, that the _height_ of the left and right sub-tree differ by no more than 1.) These structures avoid the worst case scenarios that we will discuss in class! + +`std::map` is usually implemented as a balanced BST. + +## Hash Tables + +A _hash table_ is an alternative implementation for an associative array. It consists of a "table" in the form of a random access array. In order to find at which index $i$ of the table a key-value pair $(k,v)$ is stored, we have function $i = h(k)$. An ideal hash function is constant time on all keys and minimises the chance of collisions. A _collision_ between two keys $k_1$ and $k_2$ is when $h(k_1) = h(k_2)$. Note that it is not possible for $h$ to be completely collisionless unless you have at least as many rows in your hash table as elements you want to insert. Generally the table stores some kind of list at each index in order to resolve collisions, so in practice you will typically have a random access array of _pointers_, and each pointer will point to an array (or list, or similar structure) containing all the key-value pairs which hash to that index. + +The diagram below gives an example of a hash table for the same data as the BST above. In the ideal case all the keys would map to different indices in the table, but for illustration purposes we have shown a number of collisions. From each hash function on the left we get an index in the random access array, and then we can follow a pointer to all the data stored at that index. + +![image](img/HashTable.png) + +How quick it is to look up an element in this list will depend on the kind of structure used (for example, all of the above structures could be used!) but the key to a hash table's performance is that **collisions should be rare** so that the size of these lists remains small. If the number of colliding keys is bounded by a constant then the look-up/insertion in the list will constant time, and since the hash function and checking the random access array are also constant time, hash tables have $O(1)$ operations for insertion, look-up, and removal. Just because the complexity is $O(1)$ (under appropriate circumstances) doesn't necessarily mean that hash tables are faster than other methods though: there are overheads to think about as well, especially from the hash function! Often a BST will be faster for moderately sized data. Hash tables can also require allocating more memory than you need. + +`std::unordered_map` is usually implemented as a hash table. + +## Cache Performance of Data-Structures + +As we've seen from the above, structures like linked lists, binary search trees, and hash tables can be highly fragmented in memory (i.e. they are not necessarily contiguous). This prevents us from getting substantial performance advantage from hardware caching the way that we do when iterating over contiguous arrays. When we can, it is desirable to store data in a so-called "flat" datastructure, i.e. a contiguous block of memory with data stored in an appropriate order. This makes iterating over data faster due to the cache benefits, and also makes data easier to send to external devices such as GPUs or other CPUs (as we'll see when we explore MPI in weeks 9 and 10). + +This does not mean that we shouldn't use data structures like linked lists or BSTs, but its important to understand the advantages and disadvantages to make informed choices. For example, if we need an associative array of key-value pairs, simply storing them in a random access array would make it very hard to look up the value for a given key. Having a structure like a BST makes this look up fast even though the data is more fragmented. \ No newline at end of file diff --git a/07performance/img/BinarySearchTree.png b/07performance/img/BinarySearchTree.png new file mode 100755 index 000000000..a5a541a5e Binary files /dev/null and b/07performance/img/BinarySearchTree.png differ diff --git a/07performance/img/HashTable.png b/07performance/img/HashTable.png new file mode 100755 index 000000000..3bb3dbd46 Binary files /dev/null and b/07performance/img/HashTable.png differ diff --git a/07performance/img/LinkedListMemory.png b/07performance/img/LinkedListMemory.png new file mode 100755 index 000000000..fd083e21e Binary files /dev/null and b/07performance/img/LinkedListMemory.png differ diff --git a/07performance/index.md b/07performance/index.md index ece2108a5..590ad8eca 100644 --- a/07performance/index.md +++ b/07performance/index.md @@ -20,7 +20,12 @@ Even though parallelism can help us improve our throughput, single core optimisa - Speed of different kinds of memory access. - Cache structure and operation. - Writing algorithms to effectively exploit the cache. -4. [Compiler Optimisation](sec03Optimisation.html) +4. [Common Data-Structures](DataStructures.html) + - Random Access Arrays + - Linked Lists + - Binary Search Trees + - Hash Tables +5. [Compiler Optimisation](sec03Optimisation.html) - Automated optimisation by the compiler. - Compiler flags for optimisation. - Examples of optimisations, pros and cons. diff --git a/07performance/sec03Optimisation.md b/07performance/sec03Optimisation.md index 04d51d2dc..927bf1a70 100644 --- a/07performance/sec03Optimisation.md +++ b/07performance/sec03Optimisation.md @@ -4,14 +4,16 @@ title: Compiler Optimisation Estimated Reading Time: 45 minutes -# Compiler Optimisation +# Compiler Optimisation and Compile-Time Evaluation -Compilation is the translation of our high level code (in this case C++) into machine code that reflects the instruction set of the specific hardware for which it is compiled. This machine code can closely reflect the C++ code, implementing everything explicitly the way that it's written, or it can be quite different from the structure and form of the C++ code **as long as it produces an equivalent program**. The purpose of this restructuring is to provide optimisations, usually for speed. Modern compilers have a vast array of optimisations which can be applied to code as it is compiled to the extent that few people could write better optimised assembly code manually, a task that rapidly becomes infeasible and forbiddingly time consuming as projects become larger and more complex. +Compilation is the translation of our high level code (in this case C++) into machine code that reflects the instruction set of the specific hardware for which it is compiled. This machine code can closely reflect the C++ code, implementing everything explicitly the way that it's written, or it can be quite different from the structure and form of the C++ code **as long as it produces an equivalent program***. The purpose of this restructuring is to provide optimisations, usually for speed. Modern compilers have a vast array of optimisations which can be applied to code as it is compiled to the extent that few people could write better optimised assembly code manually, a task that rapidly becomes infeasible and forbiddingly time consuming as projects become larger and more complex. There is another benefit to automated compiler optimisation. Compilers, by necessity, do produce hardware specific output, as they must translate programs into the instruction set of a given processor. This means that even if we have written highly portable code which makes no hardware specific optimisations, we can still benefit from these optimisations if they can be done by the compiler when compiling our code for different targets! As we shall see below, some processors may have different features such as machine level instructions for vectorised arithmetic which can be implemented by the compiler without changing the C++ code, producing different optimised programs for different hardware from a single, generic C++ code. As such, to get the best out of our C++ code we need to rely to some extent on automated optimisation by the compiler. This does not mean that we should not choose effective algorithms -- the compiler will not simply replace a slow sorting algorithm with a better one! -- but rather compiler optimisation should be used in conjunction with our own best practices for writing efficient software. +> *_What is considered an "equivalent program" is beyond the scope of this course, and falls under the field of programming language semantics (the study of the meaning of programs). For now you can take equivalence to mean that the results of two computations are the same whenever they provided with the same external inputs. Things like the CPU clock, and therefore timing information, would qualify as external inputs, and so changing the timing results of a computation doesn't change the meaning of the program._ + ## Optimisation Trade Offs Code with optimisations applied will generally run faster, but there are a number of other impacts that it can also have that are worth bearing in mind when selecting appropriate optimisations to apply. @@ -25,6 +27,113 @@ Code with optimisations applied will generally run faster, but there are a numbe 4. Standards Compliance. - Some optimisations are not compliant with floating point standards; in particular they may affect floating point computations by rearranging numerical expressions ("free re-associations"). Using these can jeopardise the accuracy of your programs. +## Constant Expressions and Compile-Time Evaluation + +We can tell the compiler to do some computations at compile-time instead of during run-time. Consider some simple code like this: + +```cpp +int x = 5; +int y = x*x + 12; +int z = factorial(5); +``` + +The variables $y$ and $z$ are the result of simple, deterministic expressions that depend only on information that we have at compile time. An equivalent program could look like this: + +```cpp +int x = 5; +int y = 37; +int z = 120; +``` + +This program clearly doesn't need to do any work at run-time in order to assign values to `y` and `z`, but we have lost the expressiveness of our original version (which makes the relationships between `x`, `y`, and `z` clear), and we would have to update `y` and `z` manually if we changed the initialisation of `x`. + +**Constant expressions** provide us with a way of ensuring that we can write expressive code like the first example _and_ forcing the compiler to evaluate the expressions and replace them with their results at compile time. This is in particular useful for more complex functions that could be time-consuming and runtime and are less likely to be automatically optimised by the compiler. + +### Constant Expression Syntax + +A variable or a function can be declared as a constant expression using `constexpr`. For example: + +```cpp +constexpr int add(int a, int b) +{ + int c = a + b; + return c; +} + +int main() +{ + constexpr int x = 5; + constexpr int y = add(x, 18); + + return 0; +} +``` + +The initialization of `x` is essentially equivalent to `const x = 5` in this case. There is an additional restriction on `constexpr` compared to `const` however, which is that the initialisation of a `const` variable can happen at runtime, and depend on the run-time state, whereas the initialisation of a `constexpr` variable must be able to be performed at compile time. + +The use of `constexpr` for the function `add` enforces that this function can be evaluated on compile time and will not rely on any runtime information. This means that the variable assignment for `y` does not require runtime calculation, but the compiler will simplify `add(x, 18)` to `23` and simply assign the value to `y` without ever calling the function when the program is run. + +### Limitations of Constant Expressions + +Not all computations can be done at compile-time, and therefore there are a number of conditions that constant expressions must fulfil. The [complete list of conditions](https://en.cppreference.com/w/cpp/language/constexpr.html) is quite involved, but here are some key points to remember: + +- A `constexpr` function cannot contain `try` statements, since handling exceptions would require run-time state. + - You can _write_ a `throw` statement in a `constexpr` function but if the `throw` statement is reached during the compile time evaluation you will get a compiler error. This is useful because it will warn you if there is an error case in a compile-time evaluation, and because you can also call `constexpr` functions at run-time as well which can then handle the exception. +- A `constexpr` function cannot include uninitialised variables (e.g. `int z;`). +- A `constexpr` function cannot declare `static` variables, since they require maintaining a state between function calls. +- A `constexpr` function cannot call a non-`constexpr` function. This includes things like dynamic memory allocation with `new`. However since a `constexpr` function can call other `constexpr` functions, you can also write a _recursive_ `constexpr` function! +- A `constexpr` function cannot declare a _non-literal_ type variable. [Literal types](https://en.cppreference.com/w/cpp/language/constant_expression.html#Literal_type) need to fulfil a variety of conditions, in essence they are simple types that can be worked with at compile time, so they must be able to be constructed using a `constexpr`, have trivial destructors (no custom destructor logic needs to be called), and not contain member variables of non-literal types. An example of a non-literal type is `std::vector` since it has non-trivial destruction logic (heap memory deallocation), but `std::array` _is_ a literal type since its size is known at compile time and so it can be stack-allocated and doesn't require any specialised destruction logic. +- A `constexpr` variable must be a literal type, and must be initialised by a constant expression e.g. a call to a `constexpr` function with argument _known at compile time_ (e.g. an explicit number like `2.4` or a `const`/`constexpr` variable known at compile-time). + +If your function is a _pure function_ that doesn't involve exception handling, dynamic memory allocation/deallocation, or calls to non-`constexpr` functions, then you can probably turn it into a `constexpr` function. + +**N.B. We are using C++17 on this course. The conditions for acceptable constant expressions vary across different C++ standards, so if you use a newer or older standard you may find some differences in what you are able to write and compile.** + +### Revisiting the Factorial Example + +Consider the following `constexpr` definition of `factorial`, where we have omitted `#include` statements and `std::` namespacing for brevity. + +```cpp +constexpr int factorial(int x) +{ + if(x < 0) + { + throw domain_error("Value " + to_string(x) + " is not within the domain of factorial (x >= 0)."); + } + else if(x == 0) + { + return 1; + } + else + { + return x*factorial(x-1); + } +} +``` +Notice that we are able to make use of branching logic with separate `return` statements and recursion in this definition. + +Now consider the following statements that could appear in main, and whether or not they will compile: + +```cpp +int main() +{ + const int x = 5; + int y = 3; + + constexpr int z = factorial(x); // Compiles okay: x is const + constexpr int r = factorial(y); // Doesn't compile: y is not const + constexpr int p = factorial(-3); // Doesn't compile: throw is reached + int q = factorial(-3); // Compiles okay: throws an exception at runtime + + return 0; +} +``` +Notice that we can only initialise a `constexpr int` variable with a call to `factorial` if the **argument is `const` and the function evaluation does not `throw`**. If we initialise a non-`const` variable using `factorial` then it will be treated as a runtime expression and so these conditions don't apply. + +### Other Uses of Constant Expressions + +Constant expressions are not just useful for run-time optimisation; they also allow use to write more expressive code anywhere where we need to know information at compile time, such as constant template arguments (e.g. the length of an `std::array`) and static memory allocation (e.g. `int x[...];`). + ## Compiler Optimisation Flags The GNU compiler (gcc) has a [large number of optimisation options](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), but for the most part one uses a smaller set of flags which enable batches of these options. These batches are selected to give you control over some of the downsides of optimisation procedures discussed above. diff --git a/08openmp/02_intro_openmp.md b/08openmp/02_intro_openmp.md index 67b34c8ca..9c942a676 100644 --- a/08openmp/02_intro_openmp.md +++ b/08openmp/02_intro_openmp.md @@ -252,7 +252,7 @@ Result: 2 A reduction can be applied to a number of constructs but you'll rarely use it outwith a `parallel for`, certainly for this course. The general syntax looks like: ``` -reduction( : ) +reduction( : ) ``` `` can be replaced with one of: @@ -265,7 +265,7 @@ reduction( : ) While it's useful to know what's available, you'll probably find yourself using only the arithmetic operators and `max` or `min`. -`` can be a variable of any type *that supports the given operator*. I tend to limit these variables to built-in types mainly because the reduction operator implicitly copies the given variable which can be tricky to handle for complex types or classes. If you want to use complex types, you must be careful to make sure the type has copy constructors that manage any type-owned resources appropriately. +`` can be a variable of any type *that supports the given operator*; you can list **multiple variables in a comma-separated list** if you multiple reductions with the same operator happening in the same loop. If you have reductions over different variables _and_ with different reduction operators then you can add more than one reduction clause, one after the other. I tend to limit these variables to built-in types mainly because the reduction operator implicitly copies the given variable which can be tricky to handle for complex types or classes. If you want to use complex types, you must be careful to make sure the type has copy constructors that manage any type-owned resources appropriately. For (much) more detailed information on everything I haven't mentioned about the reduction clause, see [the reduction clause in the specification](https://www.openmp.org/spec-html/5.0/openmpsu107.html). diff --git a/09distributed_computing/sec01DistributedMemoryModels.md b/09distributed_computing/sec01DistributedMemoryModels.md index 825bd7893..86ee47b7a 100644 --- a/09distributed_computing/sec01DistributedMemoryModels.md +++ b/09distributed_computing/sec01DistributedMemoryModels.md @@ -89,13 +89,14 @@ Let's illustrate this game of life example using just two processes, $P$ and $Q$ - We cannot say which process will begin or complete its update first, or send its boundary cell data first. It does not matter! The processes are kept synchronised as much as is necessary by the message passing. - If one process is faster than the other or the message passing latency is high, then one or more process will stall while waiting to receive the data that it needs. -## Performance and Message Passing +## Estimating Performance in Message Passing Message passing naturally incurs a performance overhead. Data communication channels between processes are generally speaking much slower than straight-forward reads to RAM. As such, when designing distributed systems we should bear in mind: - The frequency of message passing should be kept down where possible. - The size of messages should be kept down where possible. - In general, a smaller number of large messages is better than a large number of small messages _for a given amount of data_. - - This is true in general of data movement, whether through message passing or memory reads to RAM or hard disks. Loosely speaking, data movement general involves a latency ($L$) and bandwidth ($B$), such that the time for $N$ bytes of data to be transferred is $\sim N/B + L$. If we send this data in $k$ separate messages, we will incur a $kL$ latency penalty instead of just $L$. + - This is true in general of data movement, whether through message passing or memory reads to RAM or hard disks. Data movement generally involves a latency ($L$), which is a time overhead for every message regardless of size, and bandwidth ($B$), which is the amount of data that can be transferred in a given time. +- The time for $N$ bytes of data to be transferred in $k$ messages can be estimated using $t \approx N/B + kL$. - If you have to choose between sending a smaller amount of total data in a larger number of messages, or a larger amount of data using a smaller number of messages, then which you should pick will depend on which term in this expression becomes dominant!