Skip to content

Conversation

@Thraetaona
Copy link

First and foremost, your Algorithmica article was very detailed and useful; thank you for that.

This pull request leaves the overall implementation mostly the same, but expands upon it in some other ways:

  1. Considering that the .cc file already uses some C-styled code and functions such as printf(), as well as the fact that the Codeforces blog refers to it as "... under 40 lines of C," the code here also had to be compatible with C. This has been done by defining C++'s std::min() as a ternary C macro MIN(), while also fixing some of the typecasts (e.g., float(x) --> (float)x) and array initializations (e.g., t[i][j]{0} --> t[i][j] = {0}) syntaxes to be backwards-compatible with C.

  2. To make the Code's usage more accessible, the use of global variables was also eliminated by having them moved into local/parameter variables instead. This could also be a step towards thread-safety for multithreading.

  3. Combines and ports matmul()'s generalization patches (i.e., being able to operate on square matrices whose sizes are not strictly multiples of 48) from v5-unrolled.cc and v6.cc to here.

  4. Adds basic error-handling using errno.h to the allocator.

  5. Adds additional type qualifiers and compiler optimization hints (e.g., more consts & unsigneds for immutable and non-negative variables, #defines for compile-time constants, int_fastN_ts for potential RISC applications, GCC __attribute__ and inline hints, etc.)

  6. Documents most other parts of the code and renames some of the variables (e.g., vec t[6][2] --> v8sf simd_registers[KERNEL_ROWS][KERNEL_COLS] to be more descriptive and somewhat easier to modify later on.

  7. Adds an extra check at the end of main() driver to double-check the results of both naive() and matmul() together, with a rounding error tolerance of up to 0.0010f.

  8. Extends timeit() to also take the parameters of its callee f() and return its return value.

Everything considered, these patches do not seem to regress the performance; the only possible scenario would be that a very, very large matrix is supplied and a slight fraction of the time gets spent on aligned memory allocation on the heap, although this comes with the benefit that there won't be a stack overflow due to allocation on the stack, and that the end users aren't necessarily required to pre-align their buffers (matrices) into global variables themselves.

1) Makes the C++ code backward compatible with C by defining std::min and fixing some of the typecasting and array initialization syntaxes.
2) Eliminates the use of global variables as a step towards thread-safety.
3) Combines and ports the generalization patches of matmul() from v5-unrolled.cc and v6.cc to here.
4) Adds basic error-handling to the allocator.
5) Adds additional type qualifiers and compiler optimization hints (e.g., const for immutable variables, #define, int_fastN_t, __attribute__, etc.)
6) Documents other parts of the code.
7) Adds an extra check at the end of main() to double-check the results of both naive() and matmul() with an error tolerance of 0.0010f.
8) Extends timeit() to take additional parameters and to return the return value of its callee f().
@Thraetaona Thraetaona changed the title C Backwards-Compatibility, Generalization, Etc. C Backwards-Compatibility, Generalization, Etc. for matmul selfcontained.cc Dec 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant