Releases · dgarage/llama.cpp

17 Oct 02:23

1bb4f43

b6782 Latest

Latest

mtmd : support home-cooked Mistral Small Omni (#14928)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-17T02:23:37Z
llama-b6782-bin-macos-arm64.zip

sha256:969eb69e8fa7ae6fb3763e1b61940c4bbd494c6f0441639b94c5842d6a870dcf

10.4 MB 2025-10-17T02:23:49Z
llama-b6782-bin-macos-x64.zip

sha256:dc6659f92826e299942bfa6d22ecdefd2c5935261d6da574d3c903de46ccc9c5

27 MB 2025-10-17T02:23:50Z
llama-b6782-bin-ubuntu-vulkan-x64.zip

sha256:36280c5f3493cf20420ed9b67ebd1371f16585f24b5a412f37405d02a423d770

25.8 MB 2025-10-17T02:23:51Z
llama-b6782-bin-ubuntu-x64.zip

sha256:d9792bca70ac426936f65a726f76771f242a55d79a24808bc47a3d52b4d40dd4

12.5 MB 2025-10-17T02:23:53Z
llama-b6782-bin-win-cpu-arm64.zip

sha256:8d2ada89c1636d8c6c9bbc851ffcdae2ee2d4cf2e127d166d7ecaf4af4b6868c

10.6 MB 2025-10-17T02:23:54Z
llama-b6782-bin-win-cpu-x64.zip

sha256:c0c52624a14ec1b46123b0c01b09f96c06134f162061bb8f8d69c626be317df2

13.7 MB 2025-10-17T02:23:55Z
llama-b6782-bin-win-cuda-12.4-x64.zip

sha256:995c8d9cab41d461df5abdcfd42c826ac79b74230cff929e75e5194d215caf99

169 MB 2025-10-17T02:23:56Z
llama-b6782-bin-win-hip-radeon-x64.zip

sha256:d02b51c8623eaa1d921624b943df8b819ca6981493a498c272d54e0cb0754ccb

321 MB 2025-10-17T02:24:01Z
llama-b6782-bin-win-opencl-adreno-arm64.zip

sha256:ded203e4a7f93405bd44049e07f0fc1bd845645456130b09494b97d1107ea5b0

11 MB 2025-10-17T02:24:12Z
Source code (zip)

2025-10-16T17:00:31Z
Source code (tar.gz)

2025-10-16T17:00:31Z

02 Oct 08:04

github-actions

b6665

95ce098

b6665

HIP: add IMbackK to codeowner (#16375)

Assets 15

18 Sep 05:58

github-actions

b6503

62c3b64

b6503

CANN: Remove print (#16044)

Signed-off-by: noemotiovon <757486878@qq.com>

Assets 15

14 Aug 07:05

github-actions

b6153

3ea913f

b6153

perplexity: give more information about constraints on failure (#15303)

* perplexity: give more information about constraints on failure

This checks whether -np is insufficient vs context, and provides clues as to how much is needed for each.

* log formatting

* log error and return instead of storing max_seq_exceeded int

* check if s0 is zero for -np check

Assets 15

13 Aug 02:13

github-actions

b6140

b049315

b6140

HIP: disable sync warp shuffel operators from clr amd_warp_sync_funct…

Assets 15

07 Aug 03:58

github-actions

b6106

5fd160b

b6106

ggml: Add basic SET_ROWS support in WebGPU (#15137)

* Begin work on set_rows

* Work on set rows

* Add error buffers for reporting unsupported SET_ROWS indices

* Remove extra comments

Assets 15

31 Jul 01:31

github-actions

b6039

6e67254

b6039

opencl: add `mul_mat_f32_f32_l4_lm` and `mul_mat_f16_f32_l4_lm` (#14809)

Assets 15

29 Jul 08:02

github-actions

b6020

0a5036b

b6020

CUDA: add roll (#14919)

* CUDA: add roll

* Make everything const, use __restrict__

Assets 15

18 Jul 01:18

github-actions

b5926

670e136

b5926

convert : fix Ernie4.5 MoE without shared experts (#14746)

Assets 15

03 Jun 05:19

github-actions

b5581

71e74a3

b5581

opencl: add `backend_synchronize` (#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

Assets 18

Releases: dgarage/llama.cpp

b6782

Uh oh!

b6665

Uh oh!

b6503

Uh oh!

b6153

Uh oh!

b6140

Uh oh!

b6106

Uh oh!

b6039

Uh oh!

b6020

Uh oh!

b5926

Uh oh!

b5581

Uh oh!