currently we implement add_T_range by converting to an std::span and calling add_T_array. in many (most?) cases we could do this more efficiently by directly creating a C array in add_T_range and calling move_T_array instead.
this is particularly true for add_bool_range which currently requires two copies.