More granular latency buffer sizes

I've been trying to modify the latency benchmark to include more granular buffer access sizes to get a smoother latency curve, but seems I don't understand correctly how the algorithm works.

In particular I modified main control loop to increase the testsize by 1/2 of the previous full 2^n increment:

```c
....
    else
        printf("\n");

    nbits = 10;
    
    for (niter = 0; (1 << nbits) <= size; niter++)
    {
        int testsize;
	
	if (niter % 2 == 0) 
		testsize = (1 << nbits++);
	else
		testsize = (1 << (nbits - 1)) + (1 << (nbits - 1)) / 2;
	
        xs1 = xs2 = ys = ys1 = ys2 = 0;

....

            t_before = gettime();
            random_read_test(buffer + testoffs, count, testsize);
            t_after = gettime();

.....

static void __attribute__((noinline)) random_read_test(char *zerobuffer,
                                                       int count, int testsize)
{
    uint32_t seed = 0;
    uintptr_t addrmask = testsize - 1;
```

This gives me the supposed increases that I wanted:

```
block size : single random read / dual random read
L1 :   10.9 ns          /    15.9 ns
      1024 :    0.0 ns          /     0.0 ns
      1536 :    0.0 ns          /     0.0 ns
      2048 :    0.0 ns          /     0.0 ns
      3072 :    0.0 ns          /     0.0 ns
      4096 :    0.0 ns          /     0.0 ns
      6144 :    0.0 ns          /     0.0 ns
      8192 :    0.0 ns          /     0.0 ns
     12288 :    0.0 ns          /     0.0 ns
     16384 :    0.0 ns          /     0.0 ns
     24576 :    0.0 ns          /     0.0 ns
     32768 :    0.0 ns          /     0.0 ns
     49152 :    0.0 ns          /     0.0 ns
     65536 :    4.1 ns          /     6.1 ns
     98304 :    4.0 ns          /     6.1 ns
    131072 :    6.1 ns          /     8.0 ns
    196608 :    6.1 ns          /     8.0 ns
    262144 :   10.7 ns          /    13.6 ns
    393216 :   10.7 ns          /    13.6 ns
    524288 :   13.2 ns          /    16.1 ns
    786432 :   13.2 ns          /    16.1 ns
   1048576 :   22.4 ns          /    22.5 ns
   1572864 :   22.2 ns          /    24.8 ns
   2097152 :   93.2 ns          /   116.1 ns
   3145728 :   93.1 ns          /   115.4 ns
   4194304 :  123.7 ns          /   147.0 ns
   6291456 :  121.9 ns          /   145.3 ns
....
```

But as you notice in the figures the latencies don't actually change from the previous full 2^n figure. 

Looking at the code in ```random_read_test``` I see that you limit the access pattern to a given memory range by simply masking the randomized index with a defined address mask. I of course changed the parameters as above to be able to pass the proper ```testsize``` instead of just ```nbits```. 

The resulting behaviour should theoretically work but obviously it seems I'm missing something as it doesn't work. As far as I see this shouldn't be an issue of the LCG (I hope). Do you have any input into my modifications or any feedback on other methods to change your ```random_read_test``` into accepting test sizes other than 2^n?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

More granular latency buffer sizes #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

More granular latency buffer sizes #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions