C Backwards-Compatibility, Generalization, Etc. for matmul selfcontained.cc #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First and foremost, your Algorithmica article was very detailed and useful; thank you for that.
This pull request leaves the overall implementation mostly the same, but expands upon it in some other ways:
Considering that the .cc file already uses some C-styled code and functions such as
printf(), as well as the fact that the Codeforces blog refers to it as "... under 40 lines of C," the code here also had to be compatible with C. This has been done by defining C++'sstd::min()as a ternary C macroMIN(), while also fixing some of the typecasts (e.g.,float(x)-->(float)x) and array initializations (e.g.,t[i][j]{0}-->t[i][j] = {0}) syntaxes to be backwards-compatible with C.To make the Code's usage more accessible, the use of global variables was also eliminated by having them moved into local/parameter variables instead. This could also be a step towards thread-safety for multithreading.
Combines and ports
matmul()'s generalization patches (i.e., being able to operate on square matrices whose sizes are not strictly multiples of 48) from v5-unrolled.cc and v6.cc to here.Adds basic error-handling using
errno.hto the allocator.Adds additional type qualifiers and compiler optimization hints (e.g., more
consts &unsigneds for immutable and non-negative variables,#defines for compile-time constants,int_fastN_ts for potential RISC applications, GCC__attribute__andinlinehints, etc.)Documents most other parts of the code and renames some of the variables (e.g.,
vec t[6][2]-->v8sf simd_registers[KERNEL_ROWS][KERNEL_COLS]to be more descriptive and somewhat easier to modify later on.Adds an extra check at the end of
main()driver to double-check the results of bothnaive()andmatmul()together, with a rounding error tolerance of up to0.0010f.Extends
timeit()to also take the parameters of its calleef()and return its return value.Everything considered, these patches do not seem to regress the performance; the only possible scenario would be that a very, very large matrix is supplied and a slight fraction of the time gets spent on aligned memory allocation on the heap, although this comes with the benefit that there won't be a stack overflow due to allocation on the stack, and that the end users aren't necessarily required to pre-align their buffers (matrices) into global variables themselves.