Releases: dgarage/llama.cpp
Releases · dgarage/llama.cpp
b6782
b6665
HIP: add IMbackK to codeowner (#16375)
b6503
CANN: Remove print (#16044) Signed-off-by: noemotiovon <757486878@qq.com>
b6153
perplexity: give more information about constraints on failure (#15303) * perplexity: give more information about constraints on failure This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each. * log formatting * log error and return instead of storing max_seq_exceeded int * check if s0 is zero for -np check
b6140
HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…
b6106
ggml: Add basic SET_ROWS support in WebGPU (#15137) * Begin work on set_rows * Work on set rows * Add error buffers for reporting unsupported SET_ROWS indices * Remove extra comments
b6039
opencl: add `mul_mat_f32_f32_l4_lm` and `mul_mat_f16_f32_l4_lm` (#14809)
b6020
CUDA: add roll (#14919) * CUDA: add roll * Make everything const, use __restrict__
b5926
convert : fix Ernie4.5 MoE without shared experts (#14746)
b5581
opencl: add `backend_synchronize` (#13939) * This is not needed by the normal use where the result is read using `tensor_get`, but it allows perf mode of `test-backend-ops` to properly measure performance.