-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Considering a generated code by structtensor similar to the one below:
void fn(double * f, double * w, double ** X, int L, int C) {
...
for (int j = 0; j < C; ++j)
for (int i = 0; i < L; ++i)
f[j] += (w[i] * X[i][j]);
...
}
The code will access X[i] where will find a pointer to the location corresponding to X[i][j]. The access to memory will increase according to the dimension of the X datastructure impacting the overall performance.
Would be more efficient if the generated code looked like:
void fn(double * f, double * w, double * X, int L, int C) {
...
for (int j = 0; j < C; ++j)
for (int i = 0; i < L; ++i)
f[j] += (w[i] * X[i * C + j]);
...
}
Flattening multidimensional variable would reduce access to memory to a single access per datastructure and improve the overall performance of the core computation.
In addition, this feature would also follow external data manipulation libraries standards, consequentially being easier to integrate structtensor on stabilished pipelines, ie: numpy.ndarray.
I think this is worth doing, since it will add usability and performance. It could even be an optional feature that can be specified as a command line parameter. What do you think?