Skip to content

Conversation

@victorjulien
Copy link
Member

@victorjulien victorjulien commented Dec 15, 2025

#14295 + #11725 + more Intel SIMD experiments with unrolling loops, etc.

My conclusion about the SIMD stuff is that it's not really worth it for exact memcmp as the implementations are sometimes faster than the libc implementation in certain tests, but overall libc memcmp is just much better.

For MemcmpLowercase it seems we have some success, on Intel with the SSE3 implementation. NEON on Arm seems mostly better than the non-SIMD version.

@AGSaidi SVE doesn't seem worth it here.

These are the numbers from a AWS graviton 3 instance:

Test MemcmpTestExactLibcMemcmp                                    : real:   4 - syn:  18 - stream1:   2 - stream2:   2 - 64-m:     3403k - 64-nm:     3375k - 1418-m:    40790k - 1418-nm:    41682k - 9000-m:   244441k - 9000-nm:   244276k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:   4 - syn:  18 - stream1:   2 - stream2:   2 - 64-m:     3377k - 64-nm:     3363k - 1418-m:    40771k - 1418-nm:    41061k - 9000-m:   244458k - 9000-nm:   244357k - pass
Test MemcmpTestExactSCMemcmpSVE                                   : real:   2 - syn:  19 - stream1:   1 - stream2:   1 - 64-m:     5281k - 64-nm:     4777k - 1418-m:   104382k - 1418-nm:    89588k - 9000-m:   661240k - 9000-nm:   663112k - pass
Test MemcmpTestLowercaseDefault                                   : real:   3 - syn:  36 - stream1:   1 - stream2:   1 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:   3 - syn:  37 - stream1:   1 - stream2:   1 - pass
Test MemcmpTestLowercaseNeon                                      : real:   3 - syn:  21 - stream1:   1 - stream2:   1 - pass

A bit better in the small tests, far worse in the bigger checks. I see similar on Intel.

Interestingly my new hardware, a Minisforum R1, is way worse (EDIT: on clang only, gcc is fine. See below):

Test MemcmpTestExactLibcMemcmp                                    : real:  58 - syn: 484 - stream1:  25 - stream2:  24 - 64-m:    24914k - 64-nm:    24829k - 1418-m:   390986k - 1418-nm:   415003k - 9000-m:  2387742k - 9000-nm:  2389435k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  58 - syn: 491 - stream1:  25 - stream2:  24 - 64-m:    24740k - 64-nm:    24829k - 1418-m:   391280k - 1418-nm:   415567k - 9000-m:  2388764k - 9000-nm:  2389470k - pass
Test MemcmpTestExactSCMemcmpSVE                                   : real:  83 - syn: 608 - stream1:  50 - stream2:  50 - 64-m:   158161k - 64-nm:   148893k - 1418-m:  3113348k - 1418-nm:  3109991k - 9000-m: 19614734k - 9000-nm: 19608859k - pass
Test MemcmpTestLowercaseDefault                                   : real:  91 - syn: 1571 - stream1:  46 - stream2:  44 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  85 - syn: 1558 - stream1:  39 - stream2:  38 - pass
Test MemcmpTestLowercaseNeon                                      : real: 261 - syn: 1257 - stream1: 225 - stream2: 299 - pass

The SVE/Neon case is about 8x slower here.

For reference, here is the result for a Intel W-2245:

Test MemcmpTestExactLibcMemcmp                                    : real:  15 - syn:  32 - stream1:   6 - stream2:   6 - 64-m:    11702k - 64-nm:    12538k - 1418-m:    48556k - 1418-nm:    51330k - 9000-m:   347806k - 9000-nm:   250748k - pass                            
Test MemcmpTestExactSCMemcmpDefault                               : real:  17 - syn:  36 - stream1:   9 - stream2:   7 - 64-m:    12542k - 64-nm:    13377k - 1418-m:   199509k - 1418-nm:   199161k - 9000-m:  1207344k - 9000-nm:  1208104k - pass                            
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  15 - syn:  39 - stream1:   8 - stream2:   6 - 64-m:    18472k - 64-nm:    18378k - 1418-m:   320602k - 1418-nm:   322769k - 9000-m:  1908235k - 9000-nm:  1910351k - pass                            
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  18 - syn:  46 - stream1:   7 - stream2:  11 - 64-m:    29278k - 64-nm:    26131k - 1418-m:   552968k - 1418-nm:   553742k - 9000-m:  3329029k - 9000-nm:  3415155k - pass                            
Test MemcmpTestExactSCMemcmpAVX2                                  : real:  17 - syn:  33 - stream1:   8 - stream2:   8 - 64-m:    10862k - 64-nm:    12819k - 1418-m:   121476k - 1418-nm:   123271k - 9000-m:   732115k - 9000-nm:   733321k - pass                            
Test MemcmpTestExactSCMemcmpAVX2_512                              : real:  18 - syn:  33 - stream1:  13 - stream2:   6 - 64-m:     6690k - 64-nm:     6681k - 1418-m:    71777k - 1418-nm:    72233k - 9000-m:   387276k - 9000-nm:   392116k - pass                            
Test MemcmpTestExactSCMemcmpAVX2_1024                             : real:  18 - syn:  34 - stream1:   8 - stream2:  10 - 64-m:     9193k - 64-nm:     8486k - 1418-m:   115554k - 1418-nm:   115315k - 9000-m:   686084k - 9000-nm:   685958k - pass                            
Test MemcmpTestExactSCMemcmpAVX512_128                            : real:  15 - syn:  35 - stream1:   9 - stream2:   8 - 64-m:    13369k - 64-nm:    14257k - 1418-m:   246592k - 1418-nm:   249189k - 9000-m:  1435839k - 9000-nm:  1435080k - pass                            
Test MemcmpTestExactSCMemcmpAVX512_256                            : real:  20 - syn:  37 - stream1:  10 - stream2:   8 - 64-m:    13377k - 64-nm:    12447k - 1418-m:   235589k - 1418-nm:   235900k - 9000-m:  1445400k - 9000-nm:  1446368k - pass                            
Test MemcmpTestExactSCMemcmpAVX512_512                            : real:  18 - syn:  35 - stream1:  10 - stream2:   7 - 64-m:     8187k - 64-nm:     7856k - 1418-m:   105123k - 1418-nm:   104660k - 9000-m:   596484k - 9000-nm:   595608k - pass                            
Test MemcmpTestExactSCMemcmpAVX512_2048                           : real:  21 - syn:  41 - stream1:  10 - stream2:  13 - 64-m:     8840k - 64-nm:     7857k - 1418-m:    73167k - 1418-nm:    70756k - 9000-m:   445335k - 9000-nm:   439933k - pass                            
Test MemcmpTestExactSCMemcmpAVX512_4096                           : real:  25 - syn:  46 - stream1:  19 - stream2:  17 - 64-m:    11786k - 64-nm:    10805k - 1418-m:    73091k - 1418-nm:    69195k - 9000-m:   378118k - 9000-nm:   386404k - pass                            
Test MemcmpTestExactSCMemcmpAVX512_6144                           : real:  28 - syn:  49 - stream1:  19 - stream2:  24 - 64-m:    13777k - 64-nm:    12765k - 1418-m:    76809k - 1418-nm:    77284k - 9000-m:   407007k - 9000-nm:   409740k - pass                            
Test MemcmpTestLowercaseDefault                                   : real:  21 - syn:  47 - stream1:  14 - stream2:  12 - pass                                                                                                                                                   
Test MemcmpTestLowercaseNoSIMD                                    : real:  15 - syn: 100 - stream1:   9 - stream2:   8 - pass                                                                                                                                                   
Test MemcmpTestLowercaseSSE3                                      : real:  17 - syn:  43 - stream1:   9 - stream2:   6 - pass                                                                                                                                                   
Test MemcmpTestLowercaseSSE3and                                   : real:  17 - syn:  42 - stream1:   8 - stream2:   6 - pass                                                                                                                                                   
Test MemcmpTestLowercaseSSE3andload                               : real:  16 - syn:  42 - stream1:   9 - stream2:   6 - pass                                                                                                                                                   
Test MemcmpTestLowercaseSSE42                                     : real:  19 - syn: 108 - stream1:   8 - stream2:  15 - pass                                                                                                                                                   
Test MemcmpTestLowercaseAVX2                                      : real:  18 - syn:  43 - stream1:  11 - stream2:   8 - pass                                                                                                                                                   
Test MemcmpTestLowercaseAVX512_256                                : real:  17 - syn:  45 - stream1:  13 - stream2:   9 - pass                                                                                                                                                   
Test MemcmpTestLowercaseAVX512_512                                : real:  20 - syn:  42 - stream1:  11 - stream2:   9 - pass                                                                                                                                                   

The MemcmpTestLowercaseSSE3 may be the only one worth keeping.

Here is AMD Ryzen Threadripper PRO 5965WX

Test MemcmpTestExactLibcMemcmp                                    : real:  21 - syn:  33 - stream1:  13 - stream2:  13 - 64-m:    15112k - 64-nm:    16088k - 1418-m:    51431k - 1418-nm:    52866k - 9000-m:   380400k - 9000-nm:   303679k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  24 - syn:  37 - stream1:  13 - stream2:   9 - 64-m:    12133k - 64-nm:    13154k - 1418-m:   212944k - 1418-nm:   213394k - 9000-m:  1169177k - 9000-nm:  1170395k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  24 - syn:  40 - stream1:  13 - stream2:   7 - 64-m:    20286k - 64-nm:    20254k - 1418-m:   389818k - 1418-nm:   390903k - 9000-m:  2309240k - 9000-nm:  2310517k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  24 - syn:  47 - stream1:  14 - stream2:   9 - 64-m:    29349k - 64-nm:    19254k - 1418-m:   408422k - 1418-nm:   410399k - 9000-m:  2331116k - 9000-nm:  2331144k - pass
Test MemcmpTestExactSCMemcmpAVX2                                  : real:  25 - syn:  39 - stream1:  14 - stream2:   8 - 64-m:     8105k - 64-nm:     9092k - 1418-m:   124792k - 1418-nm:   125658k - 9000-m:   605631k - 9000-nm:   606169k - pass
Test MemcmpTestExactSCMemcmpAVX2_512                              : real:  25 - syn:  37 - stream1:  14 - stream2:   8 - 64-m:     7088k - 64-nm:     8119k - 1418-m:    87132k - 1418-nm:    88178k - 9000-m:   462107k - 9000-nm:   463208k - pass
Test MemcmpTestExactSCMemcmpAVX2_1024                             : real:  25 - syn:  39 - stream1:  14 - stream2:  10 - 64-m:     9116k - 64-nm:     8113k - 1418-m:   104972k - 1418-nm:   104450k - 9000-m:   667181k - 9000-nm:   660797k - pass
Test MemcmpTestLowercaseDefault                                   : real:  24 - syn:  40 - stream1:  15 - stream2:   8 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  21 - syn:  96 - stream1:  11 - stream2:  11 - pass
Test MemcmpTestLowercaseSSE3                                      : real:  21 - syn:  45 - stream1:  12 - stream2:   7 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:  21 - syn:  45 - stream1:  11 - stream2:   7 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:  21 - syn:  44 - stream1:  12 - stream2:   6 - pass
Test MemcmpTestLowercaseSSE42                                     : real:  22 - syn: 105 - stream1:  13 - stream2:  11 - pass
Test MemcmpTestLowercaseAVX2                                      : real:  24 - syn:  40 - stream1:  13 - stream2:   8 - pass

AGSaidi and others added 27 commits December 14, 2025 09:06
Rename to match coding style. Update callers.
Systems with SSE 4.1 as the highest SSE version are getting pretty
rare, so it's hard to test.
AVX2 implementation that compares 32 bytes at a time.

Rearrange code to make parts reusable.

Fall back to smaller SIMD for remaining buffer.

When (remaining) buffer is smaller than 32 bytes fall back to other
SIMD implementations that deal with 16 bytes of data per iteration.

Add 16/32/64 byte implementations using AVX512.
Implement for AVX512, AVX2 and SSE42.
Wrapper around `memmem`.

The case sensitive search is implemented by directly calling `memmem`.

As there is no case insensitieve variant available, a wrapper around
memmem is created, that takes a sliding window approach:

1. take a slice of the haystack
2. convert it to lowercase
3. search it using memmem
4. move window forward
Tool to benchmark detection engine content inspection, which is the
inspection of individual groups of content, etc matches for a buffer.

Also add a set of basic tests for the various single pattern matching
implementation.

Output is in csv. To files for the rule based tests. To stdout for the
spm tests.
To show differences betweeen 2 result files or between spm algos
in a single result file.
TEST AVX512 6144
Test multiple lengths in each test

Many of the inputs are too short to take SIMD code paths
@github-actions
Copy link

NOTE: This PR may contain new authors.

@codecov
Copy link

codecov bot commented Dec 15, 2025

Codecov Report

❌ Patch coverage is 50.77139% with 351 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.01%. Comparing base (a53ba4a) to head (b49a46e).
⚠️ Report is 30 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14496      +/-   ##
==========================================
- Coverage   82.11%   82.01%   -0.11%     
==========================================
  Files        1013     1014       +1     
  Lines      262322   263020     +698     
==========================================
+ Hits       215408   215705     +297     
- Misses      46914    47315     +401     
Flag Coverage Δ
fuzzcorpus 59.10% <35.64%> (-0.20%) ⬇️
livemode 18.71% <31.68%> (-0.17%) ⬇️
pcap 44.48% <44.96%> (-0.15%) ⬇️
suricata-verify 64.92% <44.63%> (-0.07%) ⬇️
unittests 59.24% <62.11%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@victorjulien
Copy link
Member Author

Here is Intel(R) Celeron(R) J4115. This doesn't even have AVX2. It's clear that the SSE4.2 implementation is bad everywhere.

Test MemcmpTestExactLibcMemcmp                                    : real:  24 - syn:  65 - stream1:  12 - stream2:  14 - 64-m:    20896k - 64-nm:    22408k - 1418-m:   137336k - 1418-nm:   151213k - 9000-m:   842395k - 9000-nm:   854754k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  26 - syn:  91 - stream1:  12 - stream2:  13 - 64-m:    52099k - 64-nm:    44665k - 1418-m:   936036k - 1418-nm:   938576k - 9000-m:  5736046k - 9000-nm:  5734249k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  25 - syn:  71 - stream1:  11 - stream2:   8 - 64-m:    28166k - 64-nm:    28816k - 1418-m:   535924k - 1418-nm:   538214k - 9000-m:  3273644k - 9000-nm:  3279840k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  26 - syn:  91 - stream1:  12 - stream2:  13 - 64-m:    52594k - 64-nm:    44892k - 1418-m:   939350k - 1418-nm:   939858k - 9000-m:  5738150k - 9000-nm:  5730704k - pass
Test MemcmpTestLowercaseDefault                                   : real:  25 - syn: 156 - stream1:  12 - stream2:  18 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  26 - syn: 200 - stream1:  12 - stream2:  11 - pass
Test MemcmpTestLowercaseSSE3                                      : real:  25 - syn:  80 - stream1:  12 - stream2:  10 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:  26 - syn:  81 - stream1:  12 - stream2:  10 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:  27 - syn:  80 - stream1:  11 - stream2:   9 - pass
Test MemcmpTestLowercaseSSE42                                     : real:  25 - syn: 156 - stream1:  13 - stream2:  18 - pass

@victorjulien
Copy link
Member Author

Another result, this time from AMD Ryzen 5 8640U, which is Zen 4 I think.

Test MemcmpTestExactLibcMemcmp                                    : real:  14 - syn:  21 - stream1:   7 - stream2:   8 - 64-m:    10629k - 64-nm:    11338k - 1418-m:    36333k - 1418-nm:    36440k - 9000-m:   273401k - 9000-nm:   218986k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  14 - syn:  23 - stream1:   8 - stream2:   5 - 64-m:     5665k - 64-nm:     5672k - 1418-m:    84829k - 1418-nm:    84662k - 9000-m:   533886k - 9000-nm:   535468k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  14 - syn:  24 - stream1:   6 - stream2:   4 - 64-m:     7795k - 64-nm:     8508k - 1418-m:   140323k - 1418-nm:   142212k - 9000-m:   944856k - 9000-nm:   917885k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  14 - syn:  28 - stream1:   6 - stream2:   5 - 64-m:    17009k - 64-nm:    13465k - 1418-m:   266062k - 1418-nm:   255893k - 9000-m:  1628982k - 9000-nm:  1631788k - pass
Test MemcmpTestExactSCMemcmpAVX2                                  : real:  15 - syn:  23 - stream1:   8 - stream2:   5 - 64-m:     5035k - 64-nm:     5047k - 1418-m:    80643k - 1418-nm:    81513k - 9000-m:   509727k - 9000-nm:   513927k - pass
Test MemcmpTestExactSCMemcmpAVX2_512                              : real:  14 - syn:  22 - stream1:   8 - stream2:   5 - 64-m:     4253k - 64-nm:     4965k - 1418-m:    53102k - 1418-nm:    53854k - 9000-m:   305546k - 9000-nm:   306151k - pass
Test MemcmpTestExactSCMemcmpAVX2_1024                             : real:  14 - syn:  23 - stream1:   7 - stream2:   6 - 64-m:     5015k - 64-nm:     5695k - 1418-m:    69478k - 1418-nm:    73504k - 9000-m:   436392k - 9000-nm:   434707k - pass
Test MemcmpTestExactSCMemcmpAVX512_128                            : real:  14 - syn:  26 - stream1:   8 - stream2:   4 - 64-m:    12839k - 64-nm:    13465k - 1418-m:   256584k - 1418-nm:   257268k - 9000-m:  1614410k - 9000-nm:  1615201k - pass
Test MemcmpTestExactSCMemcmpAVX512_256                            : real:  15 - syn:  23 - stream1:   7 - stream2:   5 - 64-m:     5043k - 64-nm:     7204k - 1418-m:    94291k - 1418-nm:    78307k - 9000-m:   598438k - 9000-nm:   518770k - pass
Test MemcmpTestExactSCMemcmpAVX512_512                            : real:  15 - syn:  23 - stream1:   8 - stream2:   5 - 64-m:     3589k - 64-nm:     4364k - 1418-m:    41084k - 1418-nm:    43062k - 9000-m:   308174k - 9000-nm:   252675k - pass
Test MemcmpTestExactSCMemcmpAVX512_2048                           : real:  15 - syn:  24 - stream1:   8 - stream2:   7 - 64-m:     4261k - 64-nm:     4252k - 1418-m:    41937k - 1418-nm:    49178k - 9000-m:   285001k - 9000-nm:   255779k - pass
Test MemcmpTestExactSCMemcmpAVX512_4096                           : real:  18 - syn:  26 - stream1:  12 - stream2:  11 - 64-m:     7087k - 64-nm:     5672k - 1418-m:    42251k - 1418-nm:    49595k - 9000-m:   285770k - 9000-nm:   251328k - pass
Test MemcmpTestExactSCMemcmpAVX512_6144                           : real:  18 - syn:  28 - stream1:  13 - stream2:  14 - 64-m:     7373k - 64-nm:     6380k - 1418-m:    42997k - 1418-nm:    49558k - 9000-m:   286529k - 9000-nm:   256126k - pass
Test MemcmpTestExactSCMemcmpSVE                                   : pass
Test MemcmpTestLowercaseDefault                                   : real:  16 - syn:  26 - stream1:   8 - stream2:   7 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  13 - syn:  64 - stream1:   7 - stream2:   7 - pass
Test MemcmpTestLowercaseSSE3                                      : real:  14 - syn:  27 - stream1:   7 - stream2:   4 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:  15 - syn:  29 - stream1:   7 - stream2:   4 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:  14 - syn:  28 - stream1:   8 - stream2:   5 - pass
Test MemcmpTestLowercaseSSE42                                     : real:  15 - syn:  74 - stream1:   7 - stream2:   7 - pass
Test MemcmpTestLowercaseAVX2                                      : real:  15 - syn:  25 - stream1:   7 - stream2:   5 - pass
Test MemcmpTestLowercaseAVX512_256                                : real:  16 - syn:  26 - stream1:   7 - stream2:   5 - pass
Test MemcmpTestLowercaseAVX512_512                                : real:  15 - syn:  27 - stream1:   7 - stream2:   6 - pass

Again SSE3 for lowercase, libc for memcmp.

@victorjulien
Copy link
Member Author

11th Gen Intel(R) Core(TM) i7-1165G7

Test MemcmpTestExactLibcMemcmp                                    : real:  10 - syn:  21 - stream1:   6 - stream2:   5 - 64-m:     8243k - 64-nm:     9613k - 1418-m:    36644k - 1418-nm:    37353k - 9000-m:   292835k - 9000-nm:   219424k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  13 - syn:  24 - stream1:   6 - stream2:   4 - 64-m:     8412k - 64-nm:    10289k - 1418-m:   164810k - 1418-nm:   119773k - 9000-m:   610745k - 9000-nm:   675822k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  11 - syn:  25 - stream1:   6 - stream2:   3 - 64-m:     8923k - 64-nm:    11695k - 1418-m:   155510k - 1418-nm:   159707k - 9000-m:   831818k - 9000-nm:   833079k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  11 - syn:  35 - stream1:   5 - stream2:   6 - 64-m:    23674k - 64-nm:    21549k - 1418-m:   427160k - 1418-nm:   428058k - 9000-m:  2734271k - 9000-nm:  2734244k - pass
Test MemcmpTestExactSCMemcmpAVX2                                  : real:  13 - syn:  24 - stream1:   6 - stream2:   4 - 64-m:     6180k - 64-nm:     8922k - 1418-m:   118757k - 1418-nm:   114709k - 9000-m:   614597k - 9000-nm:   672528k - pass
Test MemcmpTestExactSCMemcmpAVX2_512                              : real:  13 - syn:  23 - stream1:   6 - stream2:   4 - 64-m:     4115k - 64-nm:     5489k - 1418-m:    71425k - 1418-nm:    69459k - 9000-m:   247446k - 9000-nm:   244307k - pass
Test MemcmpTestExactSCMemcmpAVX2_1024                             : real:  15 - syn:  24 - stream1:   6 - stream2:   6 - 64-m:     6458k - 64-nm:     5488k - 1418-m:    84463k - 1418-nm:    85972k - 9000-m:   487005k - 9000-nm:   486082k - pass
Test MemcmpTestExactSCMemcmpAVX512_128                            : real:  11 - syn:  25 - stream1:   6 - stream2:   3 - 64-m:    14848k - 64-nm:    15881k - 1418-m:   188574k - 1418-nm:   190808k - 9000-m:  1077234k - 9000-nm:  1077628k - pass
Test MemcmpTestExactSCMemcmpAVX512_256                            : real:  11 - syn:  24 - stream1:   6 - stream2:   4 - 64-m:     7550k - 64-nm:     8921k - 1418-m:   106938k - 1418-nm:   104048k - 9000-m:   490813k - 9000-nm:   538228k - pass
Test MemcmpTestExactSCMemcmpAVX512_512                            : real:  11 - syn:  22 - stream1:   6 - stream2:   5 - 64-m:     4115k - 64-nm:     4803k - 1418-m:    69531k - 1418-nm:    66902k - 9000-m:   233685k - 9000-nm:   238439k - pass
Test MemcmpTestExactSCMemcmpAVX512_2048                           : real:  12 - syn:  23 - stream1:   6 - stream2:   5 - 64-m:     4803k - 64-nm:     4800k - 1418-m:    36907k - 1418-nm:    40613k - 9000-m:   213868k - 9000-nm:   218831k - pass
Test MemcmpTestExactSCMemcmpAVX512_4096                           : real:  17 - syn:  27 - stream1:  10 - stream2:  10 - 64-m:     6174k - 64-nm:     6177k - 1418-m:    37757k - 1418-nm:    42166k - 9000-m:   215919k - 9000-nm:   221161k - pass
Test MemcmpTestExactSCMemcmpAVX512_6144                           : real:  18 - syn:  28 - stream1:  12 - stream2:  12 - 64-m:     7548k - 64-nm:     8036k - 1418-m:    38109k - 1418-nm:    44041k - 9000-m:   205674k - 9000-nm:   212483k - pass
Test MemcmpTestExactSCMemcmpSVE                                   : pass
Test MemcmpTestLowercaseDefault                                   : real:  15 - syn:  25 - stream1:   9 - stream2:   5 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  12 - syn:  76 - stream1:   6 - stream2:   5 - pass
Test MemcmpTestLowercaseSSE3                                      : real:  11 - syn:  32 - stream1:   6 - stream2:   4 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:  12 - syn:  31 - stream1:   6 - stream2:   4 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:  11 - syn:  31 - stream1:   6 - stream2:   4 - pass
Test MemcmpTestLowercaseSSE42                                     : real:  11 - syn:  86 - stream1:   7 - stream2:   8 - pass
Test MemcmpTestLowercaseAVX2                                      : real:  12 - syn:  27 - stream1:   7 - stream2:   6 - pass
Test MemcmpTestLowercaseAVX512_256                                : real:  12 - syn:  27 - stream1:   6 - stream2:   5 - pass
Test MemcmpTestLowercaseAVX512_512                                : real:  11 - syn:  25 - stream1:   6 - stream2:   5 - pass

@victorjulien
Copy link
Member Author

AMD Ryzen AI 5 340, which has zen 5:

Test MemcmpTestExactLibcMemcmp                                    : real:   7 - syn:  10 - stream1:   3 - stream2:   3 - 64-m:     4125k - 64-nm:     4539k - 1418-m:    18978k - 1418-nm:    19383k - 9000-m:   145646k - 9000-nm:   117973k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:   7 - syn:  11 - stream1:   2 - stream2:   2 - 64-m:     2888k - 64-nm:     3299k - 1418-m:    32756k - 1418-nm:    34632k - 9000-m:   224404k - 9000-nm:   226275k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:   6 - syn:  11 - stream1:   2 - stream2:   1 - 64-m:     4533k - 64-nm:     4537k - 1418-m:    90718k - 1418-nm:    75208k - 9000-m:   415310k - 9000-nm:   410439k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:   6 - syn:  14 - stream1:   3 - stream2:   2 - 64-m:     8253k - 64-nm:     7425k - 1418-m:   171270k - 1418-nm:   171057k - 9000-m:   958974k - 9000-nm:   963938k - pass
Test MemcmpTestExactSCMemcmpAVX2                                  : real:   7 - syn:  10 - stream1:   3 - stream2:   2 - 64-m:     3301k - 64-nm:     3300k - 1418-m:    41442k - 1418-nm:    42326k - 9000-m:   267418k - 9000-nm:   273482k - pass
Test MemcmpTestExactSCMemcmpAVX2_512                              : real:   7 - syn:  10 - stream1:   3 - stream2:   2 - 64-m:     2885k - 64-nm:     2888k - 1418-m:    19469k - 1418-nm:    20610k - 9000-m:   119958k - 9000-nm:   120442k - pass
Test MemcmpTestExactSCMemcmpAVX2_1024                             : real:   7 - syn:  11 - stream1:   3 - stream2:   2 - 64-m:     3300k - 64-nm:     3297k - 1418-m:    31349k - 1418-nm:    32764k - 9000-m:   191632k - 9000-nm:   192039k - pass
Test MemcmpTestExactSCMemcmpAVX512_128                            : real:   7 - syn:  12 - stream1:   3 - stream2:   2 - 64-m:     5013k - 64-nm:     4730k - 1418-m:    89094k - 1418-nm:    87865k - 9000-m:   485987k - 9000-nm:   483976k - pass
Test MemcmpTestExactSCMemcmpAVX512_256                            : real:   7 - syn:  11 - stream1:   3 - stream2:   2 - 64-m:     3711k - 64-nm:     3297k - 1418-m:    42330k - 1418-nm:    43524k - 9000-m:   278824k - 9000-nm:   281804k - pass
Test MemcmpTestExactSCMemcmpAVX512_512                            : real:   7 - syn:  12 - stream1:   3 - stream2:   2 - 64-m:     3301k - 64-nm:     2885k - 1418-m:    24472k - 1418-nm:    28953k - 9000-m:   210097k - 9000-nm:   216215k - pass
Test MemcmpTestExactSCMemcmpAVX512_2048                           : real:   7 - syn:  11 - stream1:   3 - stream2:   3 - 64-m:     2480k - 64-nm:     3300k - 1418-m:    23870k - 1418-nm:    28866k - 9000-m:   174869k - 9000-nm:   146975k - pass
Test MemcmpTestExactSCMemcmpAVX512_4096                           : real:   9 - syn:  13 - stream1:   5 - stream2:   6 - 64-m:     3092k - 64-nm:     3326k - 1418-m:    23792k - 1418-nm:    28869k - 9000-m:   174944k - 9000-nm:   146999k - pass
Test MemcmpTestExactSCMemcmpAVX512_6144                           : real:   9 - syn:  14 - stream1:   6 - stream2:   7 - 64-m:     4124k - 64-nm:     4141k - 1418-m:    23847k - 1418-nm:    28626k - 9000-m:   175021k - 9000-nm:   146790k - pass
Test MemcmpTestExactSCMemcmpSVE                                   : pass
Test MemcmpTestLowercaseDefault                                   : real:   7 - syn:  12 - stream1:   3 - stream2:   2 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:   6 - syn:  31 - stream1:   2 - stream2:   3 - pass
Test MemcmpTestLowercaseSSE3                                      : real:   6 - syn:  14 - stream1:   2 - stream2:   2 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:   6 - syn:  14 - stream1:   2 - stream2:   2 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:   6 - syn:  13 - stream1:   3 - stream2:   2 - pass
Test MemcmpTestLowercaseSSE42                                     : real:   6 - syn:  46 - stream1:   2 - stream2:   3 - pass
Test MemcmpTestLowercaseAVX2                                      : real:   6 - syn:  12 - stream1:   3 - stream2:   2 - pass
Test MemcmpTestLowercaseAVX512_256                                : real:   6 - syn:  12 - stream1:   3 - stream2:   2 - pass
Test MemcmpTestLowercaseAVX512_512                                : real:   6 - syn:  13 - stream1:   3 - stream2:   2 - pass

@victorjulien
Copy link
Member Author

Intel(R) Xeon(R) CPU E5-2680 v4

Test MemcmpTestExactLibcMemcmp                                    : real:  18 - syn:  30 - stream1:   8 - stream2:   8 - 64-m:    11666k - 64-nm:    10932k - 1418-m:    47616k - 1418-nm:    48267k - 9000-m:   353156k - 9000-nm:   258404k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  18 - syn:  33 - stream1:   9 - stream2:   6 - 64-m:    11665k - 64-nm:    10762k - 1418-m:   137791k - 1418-nm:   139752k - 9000-m:   709722k - 9000-nm:   658169k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  18 - syn:  35 - stream1:   8 - stream2:   5 - 64-m:    15309k - 64-nm:    16028k - 1418-m:   234686k - 1418-nm:   275665k - 9000-m:  1359570k - 9000-nm:  1358102k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  17 - syn:  46 - stream1:   9 - stream2:   9 - 64-m:    31336k - 64-nm:    28426k - 1418-m:   546909k - 1418-nm:   548956k - 9000-m:  3313141k - 9000-nm:  3314523k - pass
Test MemcmpTestExactSCMemcmpAVX2                                  : real:  19 - syn:  35 - stream1:  11 - stream2:   6 - 64-m:    10368k - 64-nm:    10964k - 1418-m:   133663k - 1418-nm:   137097k - 9000-m:   700109k - 9000-nm:   697296k - pass
Test MemcmpTestExactSCMemcmpAVX2_512                              : real:  21 - syn:  34 - stream1:  12 - stream2:   7 - 64-m:     6575k - 64-nm:     5828k - 1418-m:    78720k - 1418-nm:    76976k - 9000-m:   339169k - 9000-nm:   341109k - pass
Test MemcmpTestExactSCMemcmpAVX2_1024                             : real:  22 - syn:  36 - stream1:  12 - stream2:  10 - 64-m:    12398k - 64-nm:    11660k - 1418-m:   106713k - 1418-nm:   108512k - 9000-m:   614496k - 9000-nm:   573370k - pass
Test MemcmpTestLowercaseDefault                                   : real:  19 - syn:  36 - stream1:   8 - stream2:   6 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  16 - syn:  89 - stream1:   6 - stream2:   6 - pass
Test MemcmpTestLowercaseSSE3                                      : real:  17 - syn:  52 - stream1:   7 - stream2:   6 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:  17 - syn:  54 - stream1:   7 - stream2:   6 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:  18 - syn:  40 - stream1:   7 - stream2:   4 - pass
Test MemcmpTestLowercaseSSE42                                     : real:  17 - syn: 101 - stream1:   6 - stream2:  11 - pass
Test MemcmpTestLowercaseAVX2                                      : real:  18 - syn:  38 - stream1:   8 - stream2:   7 - pass

SSE3 again here, but the andload variant?

@victorjulien
Copy link
Member Author

Intel(R) Core(TM) Ultra 5 225H

Test MemcmpTestExactLibcMemcmp                                    : real:  11 - syn:  19 - stream1:   3 - stream2:   3 - 64-m:     6827k - 64-nm:     7579k - 1418-m:    29991k - 1418-nm:    29842k - 9000-m:   228534k - 9000-nm:   184538k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  11 - syn:  22 - stream1:   4 - stream2:   3 - 64-m:     5304k - 64-nm:     5853k - 1418-m:    88889k - 1418-nm:    92194k - 9000-m:   367936k - 9000-nm:   386910k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  11 - syn:  22 - stream1:   3 - stream2:   2 - 64-m:     9837k - 64-nm:     9544k - 1418-m:   131471k - 1418-nm:   136120k - 9000-m:   638797k - 9000-nm:   637389k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  11 - syn:  41 - stream1:   4 - stream2:   6 - 64-m:    35556k - 64-nm:    33707k - 1418-m:   694637k - 1418-nm:   698492k - 9000-m:  4535592k - 9000-nm:  4499123k - pass
Test MemcmpTestExactSCMemcmpAVX2                                  : real:  12 - syn:  21 - stream1:   5 - stream2:   3 - 64-m:     4559k - 64-nm:     5312k - 1418-m:    89304k - 1418-nm:    91361k - 9000-m:   369598k - 9000-nm:   386252k - pass
Test MemcmpTestExactSCMemcmpAVX2_512                              : real:  12 - syn:  21 - stream1:   6 - stream2:   4 - 64-m:     3016k - 64-nm:     3772k - 1418-m:    61530k - 1418-nm:    65567k - 9000-m:   199974k - 9000-nm:   253146k - pass
Test MemcmpTestExactSCMemcmpAVX2_1024                             : real:  12 - syn:  22 - stream1:   5 - stream2:   5 - 64-m:     4529k - 64-nm:     3772k - 1418-m:    62013k - 1418-nm:    64772k - 9000-m:   342842k - 9000-nm:   375542k - pass
Test MemcmpTestLowercaseDefault                                   : real:  11 - syn:  22 - stream1:   3 - stream2:   3 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  10 - syn:  56 - stream1:   3 - stream2:   2 - pass
Test MemcmpTestLowercaseSSE3                                      : real:  10 - syn:  23 - stream1:   3 - stream2:   3 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:  10 - syn:  24 - stream1:   3 - stream2:   3 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:  10 - syn:  24 - stream1:   3 - stream2:   2 - pass
Test MemcmpTestLowercaseSSE42                                     : real:  10 - syn: 114 - stream1:   4 - stream2:  11 - pass
Test MemcmpTestLowercaseAVX2                                      : real:  10 - syn:  22 - stream1:   4 - stream2:   4 - pass

SSE3 looks best for lowercase again.

@victorjulien
Copy link
Member Author

Intel(R) Xeon(R) CPU E5-2690 v2, SSE3 again.

Test MemcmpTestExactLibcMemcmp                                    : real:  24 - syn:  66 - stream1:  11 - stream2:  13 - 64-m:    21057k - 64-nm:    22055k - 1418-m:   128613k - 1418-nm:   138698k - 9000-m:   775009k - 9000-nm:   774366k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  23 - syn:  81 - stream1:  11 - stream2:  12 - 64-m:    45906k - 64-nm:    37149k - 1418-m:   786619k - 1418-nm:   792240k - 9000-m:  4817429k - 9000-nm:  4818158k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  20 - syn:  64 - stream1:  10 - stream2:   7 - 64-m:    21482k - 64-nm:    21807k - 1418-m:   288774k - 1418-nm:   294894k - 9000-m:  1498529k - 9000-nm:  1496817k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  21 - syn:  80 - stream1:  11 - stream2:  12 - 64-m:    47768k - 64-nm:    37446k - 1418-m:   787716k - 1418-nm:   793429k - 9000-m:  4813964k - 9000-nm:  4821313k - pass
Test MemcmpTestLowercaseDefault                                   : real:  23 - syn: 146 - stream1:  11 - stream2:  14 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  21 - syn: 136 - stream1:  11 - stream2:  10 - pass
Test MemcmpTestLowercaseSSE3                                      : real:  22 - syn:  69 - stream1:  11 - stream2:   8 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:  22 - syn:  67 - stream1:  11 - stream2:   8 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:  22 - syn:  66 - stream1:  10 - stream2:   7 - pass
Test MemcmpTestLowercaseSSE42                                     : real:  23 - syn: 147 - stream1:  11 - stream2:  14 - pass

@victorjulien
Copy link
Member Author

AMD Ryzen Threadripper 2990WX looks like MemcmpTestLowercaseSSE3andload again.

Test MemcmpTestExactLibcMemcmp                                    : real:  23 - syn:  46 - stream1:  11 - stream2:  11 - 64-m:    13568k - 64-nm:    14400k - 1418-m:    83680k - 1418-nm:    85323k - 9000-m:   504099k - 9000-nm:   506618k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  24 - syn:  49 - stream1:  16 - stream2:  10 - 64-m:    13623k - 64-nm:    13684k - 1418-m:   133194k - 1418-nm:   134644k - 9000-m:   637153k - 9000-nm:   638095k - pass
Test MemcmpTestExactSCMemcmpSSE3                                  : real:  23 - syn:  49 - stream1:  11 - stream2:   6 - 64-m:    17612k - 64-nm:    18809k - 1418-m:   343156k - 1418-nm:   346920k - 9000-m:  1948529k - 9000-nm:  1950242k - pass
Test MemcmpTestExactSCMemcmpSSE42                                 : real:  23 - syn:  55 - stream1:  12 - stream2:   8 - 64-m:    27155k - 64-nm:    23561k - 1418-m:   416879k - 1418-nm:   417319k - 9000-m:  2420617k - 9000-nm:  2414050k - pass
Test MemcmpTestExactSCMemcmpAVX2                                  : real:  29 - syn:  53 - stream1:  17 - stream2:   9 - 64-m:    12633k - 64-nm:    13026k - 1418-m:   161693k - 1418-nm:   162178k - 9000-m:   763456k - 9000-nm:   764205k - pass
Test MemcmpTestExactSCMemcmpAVX2_512                              : real:  29 - syn:  51 - stream1:  17 - stream2:   9 - 64-m:    10949k - 64-nm:    10100k - 1418-m:    79734k - 1418-nm:    83247k - 9000-m:   297675k - 9000-nm:   296861k - pass
Test MemcmpTestExactSCMemcmpAVX2_1024                             : real:  29 - syn:  53 - stream1:  17 - stream2:  11 - 64-m:    12640k - 64-nm:    11749k - 1418-m:   120584k - 1418-nm:   127384k - 9000-m:   669853k - 9000-nm:   677903k - pass
Test MemcmpTestLowercaseDefault                                   : real:  26 - syn:  55 - stream1:  10 - stream2:   9 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  20 - syn: 105 - stream1:   8 - stream2:   8 - pass
Test MemcmpTestLowercaseSSE3                                      : real:  21 - syn:  59 - stream1:   9 - stream2:   7 - pass
Test MemcmpTestLowercaseSSE3and                                   : real:  21 - syn:  58 - stream1:   9 - stream2:   6 - pass
Test MemcmpTestLowercaseSSE3andload                               : real:  21 - syn:  56 - stream1:   9 - stream2:   6 - pass
Test MemcmpTestLowercaseSSE42                                     : real:  21 - syn: 105 - stream1:   9 - stream2:   9 - pass
Test MemcmpTestLowercaseAVX2                                      : real:  45 - syn:  59 - stream1:  11 - stream2:   8 - pass

@victorjulien
Copy link
Member Author

victorjulien commented Dec 15, 2025

Apple M1 result is not really useful, need a longer test?

Test MemcmpTestExactLibcMemcmp                                    : real:   0 - syn:   0 - stream1:   0 - stream2:   0 - 64-m:       29k - 64-nm:       29k - 1418-m:      820k - 1418-nm:      799k - 9000-m:     5738k - 9000-nm:     5760k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:   0 - syn:   0 - stream1:   0 - stream2:   0 - 64-m:       29k - 64-nm:       29k - 1418-m:      821k - 1418-nm:      801k - 9000-m:     5746k - 9000-nm:     5762k - pass
Test MemcmpTestLowercaseDefault                                   : real:   0 - syn:   2 - stream1:   0 - stream2:   0 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:   0 - syn:   2 - stream1:   0 - stream2:   0 - pass
Test MemcmpTestLowercaseNeon                                      : real:   0 - syn:   0 - stream1:   0 - stream2:   0 - pass

Still Neon better than no simd?

@victorjulien
Copy link
Member Author

Apple M4

Test MemcmpTestExactLibcMemcmp                                    : real:   4 - syn:  17 - stream1:   1 - stream2:   1 - 64-m:      720k - 64-nm:      727k - 1418-m:    23886k - 1418-nm:    23992k - 9000-m:   165205k - 9000-nm:   163647k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:   4 - syn:  17 - stream1:   1 - stream2:   1 - 64-m:      720k - 64-nm:      725k - 1418-m:    23220k - 1418-nm:    24218k - 9000-m:   163588k - 9000-nm:   162890k - pass
Test MemcmpTestLowercaseDefault                                   : real:   4 - syn:  74 - stream1:   1 - stream2:   1 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:   4 - syn:  74 - stream1:   1 - stream2:   1 - pass
Test MemcmpTestLowercaseNeon                                      : real:   4 - syn:  19 - stream1:   1 - stream2:   1 - pass

Neon doing well.

@victorjulien
Copy link
Member Author

Arm A55 core

Test MemcmpTestExactLibcMemcmp                                    : real:   0 - syn:   5 - stream1:   0 - stream2:   0 - 64-m:      651k - 64-nm:      649k - 1418-m:     9233k - 1418-nm:     9212k - 9000-m:    56362k - 9000-nm:    56358k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:   0 - syn:   5 - stream1:   0 - stream2:   0 - 64-m:      649k - 64-nm:      649k - 1418-m:     9227k - 1418-nm:     9176k - 9000-m:    56330k - 9000-nm:    56335k - pass
Test MemcmpTestLowercaseDefault                                   : real:   0 - syn:   9 - stream1:   0 - stream2:   0 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:   0 - syn:   9 - stream1:   0 - stream2:   0 - pass
Test MemcmpTestLowercaseNeon                                      : real:   0 - syn:   5 - stream1:   0 - stream2:   0 - pass

Neon better as well.

Arm A76 core

Test MemcmpTestExactLibcMemcmp                                    : real:   0 - syn:   0 - stream1:   0 - stream2:   0 - 64-m:      164k - 64-nm:      165k - 1418-m:     1389k - 1418-nm:     1414k - 9000-m:     8060k - 9000-nm:     8081k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:   0 - syn:   0 - stream1:   0 - stream2:   0 - 64-m:      163k - 64-nm:      165k - 1418-m:     1389k - 1418-nm:     1412k - 9000-m:     7965k - 9000-nm:     8056k - pass
Test MemcmpTestLowercaseDefault                                   : real:   0 - syn:   1 - stream1:   0 - stream2:   0 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:   0 - syn:   1 - stream1:   0 - stream2:   0 - pass
Test MemcmpTestLowercaseNeon                                      : real:   0 - syn:   1 - stream1:   0 - stream2:   0 - pass

Unclear result.

@victorjulien
Copy link
Member Author

Overall it seems:
All archs: memcmp should be done by libc's memcmp.
x86_64: SSE3 or SSE3andload
arm64: NEON version

The minisforum is an outlier. Not sure what is up with that.

@suricata-qa
Copy link

ERROR:

ERROR: QA failed on SURI_TLPW2_autofp_suri_time.

Pipeline = 28784

@victorjulien
Copy link
Member Author

The minisforum result was with clang 21. When I use gcc 15.2 results look more in line with my expectations
Arm A720 core:

Test MemcmpTestExactLibcMemcmp                                    : real:   5 - syn:  33 - stream1:   2 - stream2:   2 - 64-m:     2325k - 64-nm:     2312k - 1418-m:    46451k - 1418-nm:    46454k - 9000-m:   288764k - 9000-nm:   288772k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:   5 - syn:  33 - stream1:   2 - stream2:   2 - 64-m:     2322k - 64-nm:     2316k - 1418-m:    46508k - 1418-nm:    46500k - 9000-m:   289065k - 9000-nm:   289052k - pass
Test MemcmpTestExactSCMemcmpSVE                                   : real:   3 - syn:  34 - stream1:   1 - stream2:   1 - 64-m:     4877k - 64-nm:     4239k - 1418-m:   105400k - 1418-nm:   107723k - 9000-m:   633985k - 9000-nm:   659413k - pass
Test MemcmpTestLowercaseDefault                                   : real:   5 - syn:  61 - stream1:   2 - stream2:   2 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:   5 - syn:  61 - stream1:   2 - stream2:   2 - pass
Test MemcmpTestLowercaseNeon                                      : real:   6 - syn:  38 - stream1:   3 - stream2:   2 - pass

Arm A520 core:

Test MemcmpTestExactLibcMemcmp                                    : real:  28 - syn: 461 - stream1:   9 - stream2:   9 - 64-m:    37669k - 64-nm:    37646k - 1418-m:   382968k - 1418-nm:   377784k - 9000-m:  2306125k - 9000-nm:  2391807k - pass
Test MemcmpTestExactSCMemcmpDefault                               : real:  28 - syn: 460 - stream1:   9 - stream2:   9 - 64-m:    37575k - 64-nm:    37547k - 1418-m:   380988k - 1418-nm:   381145k - 9000-m:  2302979k - 9000-nm:  2384847k - pass
Test MemcmpTestExactSCMemcmpSVE                                   : real:  28 - syn: 472 - stream1:   8 - stream2:   8 - 64-m:    28963k - 64-nm:    27824k - 1418-m:   628201k - 1418-nm:   626979k - 9000-m:  3933020k - 9000-nm:  3934662k - pass
Test MemcmpTestLowercaseDefault                                   : real:  34 - syn: 628 - stream1:  13 - stream2:  12 - pass
Test MemcmpTestLowercaseNoSIMD                                    : real:  34 - syn: 625 - stream1:  13 - stream2:  12 - pass
Test MemcmpTestLowercaseNeon                                      : real:  35 - syn: 516 - stream1:  13 - stream2:  17 - pass

@catenacyber catenacyber added the needs rebase Needs rebase to main label Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs rebase Needs rebase to main

Development

Successfully merging this pull request may close these issues.

4 participants