v3.4.0

Latest

Latest

jan-grzybek-ampere released this 26 Nov 12:51

· 1 commit to main since this release

9af0385

Based on ggml-org/llama.cpp b6735 (https://github.com/ggml-org/llama.cpp/releases/tag/b6735)

Flash Attention for SWA models fixed
New Flash Attention algorithm. It is optimized for long contexts (above 1024). See
"Flash Attention algorithm selection" section for details how to select attention algorithm
manually.

Also available at: DockerHub

Assets 4