Skip to content

Conversation

@mkleen
Copy link

@mkleen mkleen commented Dec 7, 2025

Which issue does this PR close?

Rationale for this change

It's explained in the issue.

What changes are included in this PR?

This adds a special implementation for Utf8View/BinaryView scalars for zip based on the design from #8653. It also includes tests. Benchmarks are available here:

Are these changes tested?

Yes.

Are there any user-facing changes?

There is a new struct ByteViewScalarImpl.

Benchmarks

System: Apple M1 Max with 10 cores on macOS 26.1

group                                                                                                       branch                                 main
-----                                                                                                       ------                                 ----
zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/10pct_true       1.00      3.5±0.04µs        ? ?/sec    37.06   128.9±1.36µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/1pct_true        1.00      3.5±0.07µs        ? ?/sec    35.76   125.1±1.76µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/50pct_nulls      1.00      3.7±0.12µs        ? ?/sec    36.91   136.8±2.17µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/50pct_true       1.00      3.5±0.06µs        ? ?/sec    40.30   139.9±2.11µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/90pct_true       1.00      3.6±0.10µs        ? ?/sec    30.57   108.5±2.62µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/99pct_true       1.00      3.5±0.05µs        ? ?/sec    28.40    99.8±2.12µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/all_false        1.00      3.5±0.02µs        ? ?/sec    36.04   127.4±3.14µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_null_scalar_vs_null_scalar/all_true         1.00      3.5±0.08µs        ? ?/sec    27.39    97.1±1.11µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/10pct_true                    1.00     28.2±0.37µs        ? ?/sec    2.70     75.9±0.61µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/1pct_true                     1.00      7.2±0.24µs        ? ?/sec    9.89    71.4±12.56µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/50pct_nulls                   1.00     51.0±2.97µs        ? ?/sec    1.75     89.4±2.50µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/50pct_true                    1.00     62.1±1.00µs        ? ?/sec    1.61     99.7±4.68µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/90pct_true                    1.00     28.8±0.64µs        ? ?/sec    2.63     75.7±1.22µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/99pct_true                    1.00      7.7±0.11µs        ? ?/sec    8.98     69.0±0.74µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/all_false                     1.00      3.7±0.13µs        ? ?/sec    19.06    69.8±1.55µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/non_nulls_scalars/all_true                      1.00      3.6±0.10µs        ? ?/sec    18.90    68.0±1.12µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/10pct_true              1.00      3.8±0.07µs        ? ?/sec    28.85   108.4±3.09µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/1pct_true               1.00      3.8±0.09µs        ? ?/sec    25.83    98.7±2.71µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/50pct_nulls             1.00      3.9±0.06µs        ? ?/sec    32.25   127.3±7.41µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/50pct_true              1.00      3.7±0.06µs        ? ?/sec    37.66   139.5±3.00µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/90pct_true              1.00      3.8±0.16µs        ? ?/sec    34.52   129.5±1.53µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/99pct_true              1.00      3.7±0.05µs        ? ?/sec    33.83   124.8±1.28µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/all_false               1.00      3.8±0.09µs        ? ?/sec    26.08    98.8±2.02µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 10/null_vs_non_null_scalar/all_true                1.00      3.8±0.08µs        ? ?/sec    32.56   123.9±1.48µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/10pct_true      1.00      3.6±0.06µs        ? ?/sec    36.09   129.8±6.06µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/1pct_true       1.00      3.6±0.35µs        ? ?/sec    34.05   122.9±5.06µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_nulls     1.00      3.7±0.12µs        ? ?/sec    36.77   137.9±5.49µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_true      1.00      3.6±0.09µs        ? ?/sec    38.23   137.4±3.35µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/90pct_true      1.00      3.6±0.06µs        ? ?/sec    29.20   104.8±1.64µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/99pct_true      1.00      3.6±0.15µs        ? ?/sec    26.94    96.9±2.73µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/all_false       1.00      3.6±0.05µs        ? ?/sec    34.97   127.5±5.81µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_null_scalar_vs_null_scalar/all_true        1.00      3.8±1.05µs        ? ?/sec    24.98    95.0±2.14µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/10pct_true                   1.00     28.9±0.46µs        ? ?/sec    2.69     77.7±1.57µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/1pct_true                    1.00      7.3±0.09µs        ? ?/sec    9.81     71.6±1.96µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/50pct_nulls                  1.00     50.3±1.16µs        ? ?/sec    1.74     87.7±1.14µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/50pct_true                   1.00     63.5±1.44µs        ? ?/sec    1.59    100.7±1.97µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/90pct_true                   1.00     29.8±0.48µs        ? ?/sec    2.64     78.6±2.85µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/99pct_true                   1.00      8.2±0.12µs        ? ?/sec    8.54     69.7±0.91µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/all_false                    1.00      3.8±0.07µs        ? ?/sec    18.77    71.6±1.51µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/non_nulls_scalars/all_true                     1.00      3.8±0.11µs        ? ?/sec    18.31    68.8±1.10µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/10pct_true             1.00      3.8±0.07µs        ? ?/sec    27.36   104.3±1.35µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/1pct_true              1.00      3.8±0.07µs        ? ?/sec    24.86    94.8±1.12µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/50pct_nulls            1.00      4.0±0.04µs        ? ?/sec    29.84   117.9±1.34µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/50pct_true             1.00      3.9±0.21µs        ? ?/sec    35.19   137.1±3.87µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/90pct_true             1.00      3.8±0.06µs        ? ?/sec    32.78   125.8±1.73µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/99pct_true             1.00      3.8±0.11µs        ? ?/sec    31.87   121.5±1.47µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/all_false              1.00      3.8±0.07µs        ? ?/sec    25.36    95.5±1.89µs        ? ?/sec
zip_8192_from_string_views size 10 and string_views size 100/null_vs_non_null_scalar/all_true               1.00      3.9±0.20µs        ? ?/sec    30.83   121.7±3.36µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/10pct_true     1.00      3.7±0.73µs        ? ?/sec    35.72   132.2±6.77µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/1pct_true      1.00      3.6±0.04µs        ? ?/sec    35.35   125.8±2.79µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_nulls    1.00      3.8±0.11µs        ? ?/sec    36.05   136.0±2.59µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_true     1.00      3.6±0.13µs        ? ?/sec    39.36   142.5±6.32µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/90pct_true     1.00      3.6±0.11µs        ? ?/sec    29.63   107.5±2.03µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/99pct_true     1.00      3.6±0.08µs        ? ?/sec    28.40   102.2±6.74µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/all_false      1.00      3.6±0.05µs        ? ?/sec    34.83   126.0±2.12µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_null_scalar_vs_null_scalar/all_true       1.00      3.6±0.05µs        ? ?/sec    27.38    98.6±1.62µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/10pct_true                  1.00     29.9±2.79µs        ? ?/sec    2.51     75.1±0.98µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/1pct_true                   1.00      7.2±0.16µs        ? ?/sec    9.48     68.3±1.01µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/50pct_nulls                 1.00     50.5±1.90µs        ? ?/sec    1.68     84.6±1.27µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/50pct_true                  1.00     64.4±0.60µs        ? ?/sec    1.53     98.6±1.71µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/90pct_true                  1.00     29.7±0.61µs        ? ?/sec    2.57     76.1±1.15µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/99pct_true                  1.00      7.9±0.09µs        ? ?/sec    8.89     70.5±2.13µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/all_false                   1.00      3.7±0.06µs        ? ?/sec    18.31    67.8±0.86µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/non_nulls_scalars/all_true                    1.00      3.7±0.06µs        ? ?/sec    18.35    67.9±1.16µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/10pct_true            1.00      3.8±0.12µs        ? ?/sec    28.20   107.5±2.55µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/1pct_true             1.00      3.9±0.16µs        ? ?/sec    25.73    99.5±2.19µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/50pct_nulls           1.00      4.1±0.14µs        ? ?/sec    29.98   122.2±2.27µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/50pct_true            1.00      3.8±0.08µs        ? ?/sec    37.05   140.1±2.01µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/90pct_true            1.00      3.9±0.20µs        ? ?/sec    33.52   131.8±3.10µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/99pct_true            1.00      3.8±0.09µs        ? ?/sec    33.55   127.6±3.56µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/all_false             1.00      3.8±0.08µs        ? ?/sec    26.47   100.8±5.55µs        ? ?/sec
zip_8192_from_string_views size 100 and string_views size 100/null_vs_non_null_scalar/all_true              1.00      3.9±0.06µs        ? ?/sec    32.05   124.6±2.16µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/10pct_true        1.00      3.6±0.40µs        ? ?/sec    35.16   126.4±1.92µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/1pct_true         1.00      3.5±0.07µs        ? ?/sec    35.43   123.6±4.98µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/50pct_nulls       1.00      3.7±0.06µs        ? ?/sec    36.06   132.4±1.80µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/50pct_true        1.00      3.6±0.06µs        ? ?/sec    38.44   136.9±2.82µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/90pct_true        1.00      3.5±0.04µs        ? ?/sec    29.82   105.2±2.25µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/99pct_true        1.00      3.5±0.08µs        ? ?/sec    27.48    96.9±1.69µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/all_false         1.00      3.6±0.12µs        ? ?/sec    33.80   123.0±2.52µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_null_scalar_vs_null_scalar/all_true          1.00      3.6±0.14µs        ? ?/sec    26.74    95.0±1.74µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/10pct_true                     1.00     27.9±0.32µs        ? ?/sec    2.65     73.9±1.31µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/1pct_true                      1.00      6.9±0.09µs        ? ?/sec    9.64     67.0±0.92µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/50pct_nulls                    1.00     49.0±0.60µs        ? ?/sec    1.73     84.7±2.45µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/50pct_true                     1.00     62.4±2.22µs        ? ?/sec    1.56     97.1±2.37µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/90pct_true                     1.00     28.7±0.37µs        ? ?/sec    2.59     74.1±1.17µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/99pct_true                     1.00      7.8±0.20µs        ? ?/sec    8.69     67.7±1.34µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/all_false                      1.00      3.6±0.09µs        ? ?/sec    18.78    68.2±2.16µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/non_nulls_scalars/all_true                       1.00      3.6±0.05µs        ? ?/sec    19.10   68.4±11.77µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/10pct_true               1.00      3.8±0.21µs        ? ?/sec    27.30   104.1±1.34µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/1pct_true                1.00      3.7±0.04µs        ? ?/sec    25.76    95.8±2.00µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/50pct_nulls              1.00      4.2±0.96µs        ? ?/sec    28.05   118.0±1.17µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/50pct_true               1.00      3.9±0.13µs        ? ?/sec    35.42   136.6±3.78µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/90pct_true               1.00      3.8±0.10µs        ? ?/sec    33.31   125.5±1.89µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/99pct_true               1.00      3.8±0.04µs        ? ?/sec    32.36   121.6±1.80µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/all_false                1.00      3.7±0.04µs        ? ?/sec    25.64    95.1±0.98µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 10/null_vs_non_null_scalar/all_true                 1.00      3.9±0.07µs        ? ?/sec    31.19   121.2±2.69µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/10pct_true       1.00      3.5±0.04µs        ? ?/sec    35.69   126.5±2.89µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/1pct_true        1.00      3.6±0.05µs        ? ?/sec    33.84   120.9±1.68µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_nulls      1.00      3.7±0.10µs        ? ?/sec    35.72   133.2±3.49µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/50pct_true       1.00      3.6±0.12µs        ? ?/sec    38.28   136.0±2.11µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/90pct_true       1.00      3.5±0.06µs        ? ?/sec    29.81   104.4±1.56µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/99pct_true       1.00      3.5±0.08µs        ? ?/sec    27.69    98.1±2.86µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/all_false        1.00      3.6±0.10µs        ? ?/sec    33.58   122.3±1.77µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_null_scalar_vs_null_scalar/all_true         1.00      3.5±0.08µs        ? ?/sec    26.79    94.7±1.02µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/10pct_true                    1.00     29.0±0.51µs        ? ?/sec    2.59     75.1±1.08µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/1pct_true                     1.00      7.4±0.10µs        ? ?/sec    9.41     69.2±1.76µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/50pct_nulls                   1.00     50.2±0.54µs        ? ?/sec    1.70     85.2±1.17µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/50pct_true                    1.00     64.1±1.59µs        ? ?/sec    1.51     96.9±1.22µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/90pct_true                    1.00     29.8±0.36µs        ? ?/sec    2.55     75.9±2.47µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/99pct_true                    1.00      8.2±0.17µs        ? ?/sec    8.24     67.8±1.11µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/all_false                     1.00      3.8±0.07µs        ? ?/sec    17.96    68.8±1.15µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/non_nulls_scalars/all_true                      1.00      3.8±0.12µs        ? ?/sec    17.37    66.1±0.97µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/10pct_true              1.00      3.8±0.27µs        ? ?/sec    27.57   105.2±3.06µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/1pct_true               1.00      3.7±0.08µs        ? ?/sec    25.44    94.8±0.94µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/50pct_nulls             1.00      3.9±0.07µs        ? ?/sec    30.10   118.6±2.83µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/50pct_true              1.00      3.9±0.30µs        ? ?/sec    35.20   135.6±1.67µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/90pct_true              1.00      3.9±0.55µs        ? ?/sec    32.58   125.9±2.14µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/99pct_true              1.00      3.8±0.36µs        ? ?/sec    32.47   122.9±4.15µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/all_false               1.00      3.8±0.10µs        ? ?/sec    25.24    94.9±0.97µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 100/null_vs_non_null_scalar/all_true                1.00      3.8±0.09µs        ? ?/sec    31.58   120.3±1.65µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/10pct_true         1.00      3.5±0.04µs        ? ?/sec    37.39   131.4±4.74µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/1pct_true          1.00      3.5±0.09µs        ? ?/sec    35.84   126.8±3.56µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/50pct_nulls        1.00      3.7±0.06µs        ? ?/sec    37.15   137.8±3.16µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/50pct_true         1.00      3.5±0.06µs        ? ?/sec    39.19   138.9±4.82µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/90pct_true         1.00      3.6±0.04µs        ? ?/sec    30.30   107.9±5.71µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/99pct_true         1.00      3.6±0.05µs        ? ?/sec    27.33    97.7±2.10µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/all_false          1.00      3.6±0.06µs        ? ?/sec    34.64   124.7±2.24µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_null_scalar_vs_null_scalar/all_true           1.00      3.7±0.19µs        ? ?/sec    26.17    96.9±1.75µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/10pct_true                      1.00     28.7±0.55µs        ? ?/sec    2.66     76.2±1.45µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/1pct_true                       1.00      7.2±0.12µs        ? ?/sec    9.58     69.0±0.80µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/50pct_nulls                     1.00     49.5±1.15µs        ? ?/sec    1.75     86.8±2.09µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/50pct_true                      1.00     62.6±0.88µs        ? ?/sec    1.65   103.4±16.82µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/90pct_true                      1.00     29.1±0.49µs        ? ?/sec    2.69     78.3±2.51µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/99pct_true                      1.00      7.8±0.09µs        ? ?/sec    9.01     70.2±1.72µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/all_false                       1.00      3.7±0.06µs        ? ?/sec    18.77    68.7±0.73µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/non_nulls_scalars/all_true                        1.00      3.6±0.10µs        ? ?/sec    18.73    68.2±1.44µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/10pct_true                1.00      3.9±0.11µs        ? ?/sec    27.68   106.9±2.29µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/1pct_true                 1.00      3.9±0.19µs        ? ?/sec    26.12   101.9±8.79µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/50pct_nulls               1.00      4.1±0.07µs        ? ?/sec    29.91   122.7±3.28µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/50pct_true                1.00      3.8±0.14µs        ? ?/sec    36.82   141.4±3.69µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/90pct_true                1.00      3.8±0.10µs        ? ?/sec    34.15   131.4±2.99µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/99pct_true                1.00      3.8±0.06µs        ? ?/sec    32.89   125.2±3.21µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/all_false                 1.00      3.8±0.06µs        ? ?/sec    26.05    99.2±2.30µs        ? ?/sec
zip_8192_from_string_views size 3 and string_views size 3/null_vs_non_null_scalar/all_true                  1.00      4.0±0.33µs        ? ?/sec    32.00  126.7±25.05µs        ? ?/sec

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 7, 2025
@mkleen mkleen changed the title Add custom implemenation for zip for string-views scalars Add custom implementation for zip for string-views scalars Dec 7, 2025
@mkleen mkleen force-pushed the zip-string-view-improv branch from 70809ca to d64eaf5 Compare December 7, 2025 18:11
@mkleen mkleen marked this pull request as ready for review December 7, 2025 18:12
@mkleen mkleen changed the title Add custom implementation for zip for string-views scalars Add custom implementation for zip for string-view scalars Dec 7, 2025
@mkleen mkleen force-pushed the zip-string-view-improv branch from d64eaf5 to 1e05651 Compare December 7, 2025 20:55
@mkleen mkleen changed the title Add custom implementation for zip for string-view scalars Add custom implementation for zip for utf8-view scalars Dec 7, 2025
@mkleen mkleen changed the title Add custom implementation for zip for utf8-view scalars Add custom implementation for zip for utf8View scalars Dec 7, 2025
@mkleen mkleen changed the title Add custom implementation for zip for utf8View scalars Add custom implementation for zip for Utf8View scalars Dec 7, 2025
@mkleen mkleen force-pushed the zip-string-view-improv branch 2 times, most recently from 2006eff to 724157f Compare December 7, 2025 22:29
@mkleen mkleen changed the title Add custom implementation for zip for Utf8View scalars Add special implementation for zip for Utf8View scalars Dec 7, 2025
@mkleen mkleen changed the title Add special implementation for zip for Utf8View scalars Add special case implementation for zip for Utf8View scalars Dec 7, 2025
@mkleen mkleen changed the title Add special case implementation for zip for Utf8View scalars Add special implementation for zip for Utf8View scalars Dec 7, 2025
@mkleen
Copy link
Author

mkleen commented Dec 8, 2025

I have some decent speed-ups, see benchmarks. It would be great if somone could provide an initial review of this.

@mkleen mkleen force-pushed the zip-string-view-improv branch from 724157f to ba3c71a Compare December 8, 2025 10:24
@mkleen mkleen changed the title Add special implementation for zip for Utf8View scalars Add special implementation for zip for Utf8View/BinaryView scalars Dec 8, 2025
@mkleen
Copy link
Author

mkleen commented Dec 9, 2025

@rluvaton Maybe you could have a look since you were the original author of this optimization? Thank you!

@rluvaton
Copy link
Member

rluvaton commented Dec 9, 2025

Great job! Will try to review it today

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @mkleen -- could you please split out the benchmarks into a separate PR so it it easier to evaluate the performance difference due to this change?

@mkleen mkleen force-pushed the zip-string-view-improv branch from 310a81a to 918e2d0 Compare December 13, 2025 14:43
@mkleen mkleen requested a review from alamb December 13, 2025 20:49
Comment on lines +669 to +670
truthy: Option<GenericByteViewArray<T>>,
falsy: Option<GenericByteViewArray<T>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question this choice of holding a GenericByteViewArray itself here 🤔

Can we decompose it to just the buffers and view u128 perhaps?

Comment on lines +683 to +689
fn get_value_from_scalar(scalar: &dyn Array) -> Option<GenericByteViewArray<T>> {
if scalar.is_null(0) {
None
} else {
Some(scalar.as_byte_view().clone())
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I would inline this; there's a lot of functions here so we would benefit from removing simple ones like this


let bytes = Buffer::from(mutable);

(bytes.into(), buffers, Some(NullBuffer::new_valid(length)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(bytes.into(), buffers, Some(NullBuffer::new_valid(length)))
(vec![views[0]; length].into(), buffers, None)

No need for null buffer if all are valid

Also can simplify buffer creation

// otherwise, we simply use the view.
let view_falsy = if falsy.total_buffer_bytes_used() > 0 {
let byte_view_falsy = ByteView::from(falsy.views()[0]);
let new_index_falsy_buffers = buffers.len() as u32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let new_index_falsy_buffers = buffers.len() as u32;
let new_index_falsy_buffers = buffers.len() as u32 + byte_view_falsy.buffer_index;

We can't assume falsy only has 1 buffer

// All values are null
return Self::get_scalar_buffers_and_nulls_for_all_values_null(number_of_values);
}
let view = value.views()[0].to_byte_slice();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this view have the right buffer offset?

Comment on lines +746 to +747
let total_number_of_bytes = true_count * view_truthy.len()
+ (predicate.len() - true_count) * view_falsy.to_byte_slice().len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view_truthy and view_falsy are u128s which we can know the constant size for; no need to call len() on their byte slices

(
bytes.into(),
buffers,
Some(NullBuffer::new_valid(predicate.len())),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Some(NullBuffer::new_valid(predicate.len())),
None,

All values are non-null so we don't need a null buffer

Comment on lines +694 to +697
let mut mutable = MutableBuffer::with_capacity(0);
mutable.repeat_slice_n_times((0u128).to_byte_slice(), len);

(mutable.into(), vec![], Some(NullBuffer::new_null(len)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut mutable = MutableBuffer::with_capacity(0);
mutable.repeat_slice_n_times((0u128).to_byte_slice(), len);
(mutable.into(), vec![], Some(NullBuffer::new_null(len)))
(vec![0; len].into(), vec![], Some(NullBuffer::new_null(len)))

At this point it would be better to inline this instead of having this thin wrapper function

(mutable.into(), vec![], Some(NullBuffer::new_null(len)))
}

fn get_scalar_buffers_and_nulls_for_single_non_nullable(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn get_scalar_buffers_and_nulls_for_single_non_nullable(
fn get_view_parts_single_value(

These function names can do with improving, they aren't very readable

Comment on lines +713 to +716
let mut bytes = MutableBuffer::with_capacity(0);
bytes.repeat_slice_n_times(view, number_of_values);

let bytes = Buffer::from(bytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut bytes = MutableBuffer::with_capacity(0);
bytes.repeat_slice_n_times(view, number_of_values);
let bytes = Buffer::from(bytes);
let bytes = vec![value.views()[0]; number_of_values];

No need to use MutableBuffer since our values (views) are simple u128s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement special case zip with scalar for Utf8View

4 participants