Releases · AmpereComputingAI/llama.cpp

26 Nov 12:51

jan-grzybek-ampere

v3.4.0

9af0385

v3.4.0 Latest

Latest

Based on ggml-org/llama.cpp b6735 (https://github.com/ggml-org/llama.cpp/releases/tag/b6735)

Flash Attention for SWA models fixed
New Flash Attention algorithm. It is optimized for long contexts (above 1024). See
"Flash Attention algorithm selection" section for details how to select attention algorithm
manually.

Also available at: DockerHub

Assets 4

15 Oct 16:32

jan-grzybek-ampere

v3.3.1

6219c16

v3.3.1

Also available at: DockerHub

Assets 4

09 Oct 12:54

jan-grzybek-ampere

v3.3.0

6219c16

v3.3.0

Also available at: DockerHub

Assets 4

03 Sep 10:24

jan-grzybek-ampere

v3.2.1

ecbcf6e

v3.2.1

Also available at: DockerHub

Assets 4

06 Aug 21:39

jan-grzybek-ampere

v3.2.0

ecbcf6e

v3.2.0

Also available at: DockerHub

Assets 4

07 Jul 12:40

jan-grzybek-ampere

v3.1.2

aa0a5d7

v3.1.2

Also available at: DockerHub

Assets 4

11 Jun 21:21

jan-grzybek-ampere

v3.1.0

aa0a5d7

v3.1.0

Also available at: DockerHub

Assets 4

03 Jun 15:44

jan-grzybek-ampere

v2.2.1

aa0a5d7

v2.2.1

Update benchmark.py

Assets 3

23 Sep 20:15

jan-grzybek-ampere

v2.0.0

4f32b2c

v2.0.0

Upgraded upstream tag enables Llama 3.1 in ollama
Support for AmpereOne platform
Breaking change: due to changed weight type IDs it is now required to re-quantize models to Q8R16 and Q4_K_4 formats with current llama-quantize tool.

Assets 4

16 Jul 23:03

jan-grzybek-ampere

v1.2.6

06e1efb

v1.2.6

Create README.md

Assets 4

Releases: AmpereComputingAI/llama.cpp

v3.4.0

Uh oh!

v3.3.1

Uh oh!

v3.3.0

Uh oh!

v3.2.1

Uh oh!

v3.2.0

Uh oh!

v3.1.2

Uh oh!

v3.1.0

Uh oh!

v2.2.1

Uh oh!

v2.0.0

Uh oh!

v1.2.6

Uh oh!