Skip to content

failed to decode the batch #5

@hoverflow

Description

@hoverflow

Hi, When I use server-parallel I get an error: updateSlots : failed to decode the batch, n_batch = 1, ret = 1

this is the complete log before the error:
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required = 107.54 MB
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 43/43 layers to GPU
llm_load_tensors: VRAM used: 8694.21 MB
...................................................................................................
llama_new_context_with_model: n_ctx = 1024
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 800.00 MB
llama_new_context_with_model: kv self size = 800.00 MB
llama_new_context_with_model: compute buffer total size = 118.13 MB
llama_new_context_with_model: VRAM scratch buffer: 112.00 MB
llama_new_context_with_model: total VRAM used: 9606.21 MB (model: 8694.21 MB, context: 912.00 MB)
Available slots:

  • slot 0
  • slot 1

llama server listening at http://0.0.0.0:8080

system prompt updated
slot 0 is processing
slot 0 released
slot 0 is processing
slot 0 released
slot 0 is processing
slot 0 released
slot 0 is processing
slot 0 released
slot 0 is processing
slot 1 is processing
updateSlots : failed to decode the batch, n_batch = 1, ret = 1

I run server-parallel with the following command:
./server-parallel -m models/xyz.gguf --ctx_size 2048 -t 4 -ngl 40 --host 0.0.0.0 --batch-size 512 --parallel 2

Of course this only happens if both slots are performing inference at the same time. Could you please help me resolve this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions