Skip to content

v3.4.0

Latest

Choose a tag to compare

@jan-grzybek-ampere jan-grzybek-ampere released this 26 Nov 12:51
· 1 commit to main since this release
9af0385

Based on ggml-org/llama.cpp b6735 (https://github.com/ggml-org/llama.cpp/releases/tag/b6735)

  • Flash Attention for SWA models fixed
  • New Flash Attention algorithm. It is optimized for long contexts (above 1024). See
    "Flash Attention algorithm selection" section for details how to select attention algorithm
    manually.

Also available at: DockerHub