Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 2 additions & 16 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
"logo": {
"light": "/logo/light.svg",
"dark": "/logo/dark.svg",
"href": "/"
"href": "https://liquid.ai"
},
"navbar": {
"links": [
Expand All @@ -52,20 +52,6 @@
}
},
"navigation": {
"global": {
"anchors": [
{
"anchor": "About Us",
"icon": "building",
"href": "https://www.liquid.ai/company/about"
},
{
"anchor": "Blog",
"icon": "pencil",
"href": "https://www.liquid.ai/company/blog"
}
]
},
"tabs": [
{
"tab": "Documentation",
Expand Down Expand Up @@ -202,7 +188,7 @@
]
},
{
"tab": "Guides",
"tab": "Examples",
"groups": [
{
"group": "Get Started",
Expand Down
4 changes: 2 additions & 2 deletions docs/fine-tuning/leap-finetune.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ LEAP Finetune will provide:
While LEAP Finetune is in development, you can fine-tune models using:

<CardGroup cols={2}>
<Card title="TRL" icon="graduation-cap" href="/lfm/fine-tuning/trl">
<Card title="TRL" icon="graduation-cap" href="/docs/fine-tuning/trl">
Hugging Face's training library with LoRA/QLoRA support
</Card>
<Card title="Unsloth" icon="zap" href="/lfm/fine-tuning/unsloth">
<Card title="Unsloth" icon="zap" href="/docs/fine-tuning/unsloth">
Memory-efficient fine-tuning with 2x faster training
</Card>
</CardGroup>
Expand Down
4 changes: 1 addition & 3 deletions docs/frameworks/outlines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ pip install outlines transformers torch

## Setup[​](#setup "Direct link to Setup")

Outlines provides a simple interface for constrained generation. The examples below use Transformers, but Outlines works with all major inference frameworks including [vLLM](/lfm/inference/vllm), [llama.cpp](/lfm/inference/llama-cpp), [MLX](/lfm/inference/mlx), [Ollama](/lfm/inference/ollama), and more. See the [Outlines documentation](https://dottxt-ai.github.io/outlines/latest/) for framework-specific examples.
Outlines provides a simple interface for constrained generation. The examples below use Transformers, but Outlines works with all major inference frameworks including [vLLM](/docs/inference/vllm), [llama.cpp](/docs/inference/llama-cpp), [MLX](/docs/inference/mlx), [Ollama](/docs/inference/ollama), and more. See the [Outlines documentation](https://dottxt-ai.github.io/outlines/latest/) for framework-specific examples.

Start by wrapping your model:

Expand Down Expand Up @@ -263,5 +263,3 @@ For a detailed example of using Outlines with LFM2-350M for smart home control,
* [Outlines GitHub](https://github.com/dottxt-ai/outlines)
* [Outlines Documentation](https://dottxt-ai.github.io/outlines/)
* [LFM2 × .txt Collaboration Blog Post](https://www.liquid.ai/blog/liquid-txt-collaboration)

[Edit this page](https://github.com/Liquid4All/docs/tree/main/lfm/frameworks/outlines.md)
4 changes: 2 additions & 2 deletions docs/help/contributing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@ Use Mintlify components appropriately:

### Links

- Use relative links for internal pages: `/lfm/inference/transformers`
- Use descriptive link text: "See the [inference guide](/lfm/inference/transformers)" not "Click [here](/lfm/inference/transformers)"
- Use relative links for internal pages: `/docs/inference/transformers`
- Use descriptive link text: "See the [inference guide](/docs/inference/transformers)" not "Click [here](/docs/inference/transformers)"

## What to Contribute

Expand Down
16 changes: 8 additions & 8 deletions docs/help/faqs.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ All LFM models support a 32k token text context length for extended conversation

<Accordion title="Which inference frameworks are supported?">
LFM models are compatible with:
- [Transformers](/lfm/inference/transformers) - For research and development
- [llama.cpp](/lfm/inference/llama-cpp) - For efficient CPU inference
- [vLLM](/lfm/inference/vllm) - For high-throughput production serving
- [MLX](/lfm/inference/mlx) - For Apple Silicon optimization
- [Ollama](/lfm/inference/ollama) - For easy local deployment
- [Transformers](/docs/inference/transformers) - For research and development
- [llama.cpp](/docs/inference/llama-cpp) - For efficient CPU inference
- [vLLM](/docs/inference/vllm) - For high-throughput production serving
- [MLX](/docs/inference/mlx) - For Apple Silicon optimization
- [Ollama](/docs/inference/ollama) - For easy local deployment
- [LEAP](/leap/index) - For edge and mobile deployment
</Accordion>

Expand All @@ -39,7 +39,7 @@ LFM2.5 models are updated versions with improved training that deliver higher pe
</Accordion>

<Accordion title="What are Liquid Nanos?">
[Liquid Nanos](/lfm/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like:
[Liquid Nanos](/docs/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like:
- Information extraction (LFM2-Extract)
- Translation (LFM2-350M-ENJP-MT)
- RAG question answering (LFM2-1.2B-RAG)
Expand Down Expand Up @@ -69,7 +69,7 @@ For most use cases, Q4_K_M or Q5_K_M provide good quality with significant size
## Fine-tuning

<Accordion title="Can I fine-tune LFM models?">
Yes! Most LFM models support fine-tuning with [TRL](/lfm/fine-tuning/trl) and [Unsloth](/lfm/fine-tuning/unsloth). Check the [Complete Model Library](/lfm/models/complete-library) for trainability information.
Yes! Most LFM models support fine-tuning with [TRL](/docs/fine-tuning/trl) and [Unsloth](/docs/fine-tuning/unsloth). Check the [Model Library](/docs/models/complete-library) for trainability information.
</Accordion>

<Accordion title="What fine-tuning methods are supported?">
Expand All @@ -82,4 +82,4 @@ Yes! Most LFM models support fine-tuning with [TRL](/lfm/fine-tuning/trl) and [U

- Join our [Discord community](https://discord.gg/DFU3WQeaYD) for real-time help
- Check the [Cookbook](https://github.com/Liquid4All/cookbook) for examples
- See [Troubleshooting](/lfm/help/troubleshooting) for common issues
- See [Troubleshooting](/docs/help/troubleshooting) for common issues
4 changes: 2 additions & 2 deletions docs/index.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@ title: "LFM Documentation"
description: "Redirect to LFM Getting Started"
---

<meta http-equiv="refresh" content="0; url=/lfm/getting-started/intro" />
<meta http-equiv="refresh" content="0; url=/docs/getting-started/welcome" />

Redirecting to [Getting Started](/lfm/getting-started/intro)...
Redirecting to [Getting Started](/docs/getting-started/welcome)...
135 changes: 15 additions & 120 deletions docs/inference/llama-cpp.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -114,67 +114,25 @@ hf download LiquidAI/LFM2.5-1.2B-Instruct-GGUF lfm2.5-1.2b-instruct-q4_k_m.gguf

## Basic Usage

llama.cpp offers three main interfaces for running inference: `llama-cpp-python` (Python bindings), `llama-server` (OpenAI-compatible server), and `llama-cli` (interactive CLI).
llama.cpp offers two main interfaces for running inference: `llama-server` (OpenAI-compatible server) and `llama-cli` (interactive CLI).

<Tabs>
<Tab title="llama-cpp-python">
For Python applications, use the `llama-cpp-python` package.

**Installation:**
```bash
pip install llama-cpp-python
```

For GPU support:
```bash
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
```

**Model Setup:**
```python
from llama_cpp import Llama

# Load model
llm = Llama(
model_path="lfm2.5-1.2b-instruct-q4_k_m.gguf",
n_ctx=4096,
n_threads=8
)

# Generate text
output = llm(
"What is artificial intelligence?",
max_tokens=512,
temperature=0.7,
top_p=0.9
)
print(output["choices"][0]["text"])
```

**Chat Completions:**
```python
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing."}
],
temperature=0.7,
max_tokens=512
)
print(response["choices"][0]["message"]["content"])
```
</Tab>

<Tab title="llama-server">
llama-server provides an OpenAI-compatible API for serving models locally.

**Starting the Server:**
```bash
llama-server -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF -c 4096 --port 8080
```

The `-hf` flag downloads the model directly from Hugging Face. Alternatively, use a local model file:
```bash
llama-server -m lfm2.5-1.2b-instruct-q4_k_m.gguf -c 4096 --port 8080
```

Key parameters:
* `-m`: Path to GGUF model file
* `-hf`: Hugging Face model ID (downloads automatically)
* `-m`: Path to local GGUF model file
* `-c`: Context length (default: 4096)
* `--port`: Server port (default: 8080)
* `-ngl 99`: Offload layers to GPU (if available)
Expand Down Expand Up @@ -216,12 +174,18 @@ llama.cpp offers three main interfaces for running inference: `llama-cpp-python`
<Tab title="llama-cli">
llama-cli provides an interactive terminal interface for chatting with models.

```bash
llama-cli -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF -c 4096 --color -i
```

The `-hf` flag downloads the model directly from Hugging Face. Alternatively, use a local model file:
```bash
llama-cli -m lfm2.5-1.2b-instruct-q4_k_m.gguf -c 4096 --color -i
```

Key parameters:
* `-m`: Path to GGUF model file
* `-hf`: Hugging Face model ID (downloads automatically)
* `-m`: Path to local GGUF model file
* `-c`: Context length
* `--color`: Colored output
* `-i`: Interactive mode
Expand All @@ -242,43 +206,6 @@ Control text generation behavior using parameters in the OpenAI-compatible API o
* **`repetition_penalty`** / **`--repeat-penalty`** (`float`, default 1.1): Penalty for repeating tokens (>1.0 = discourage repetition). Typical range: 1.0-1.5
* **`stop`** (`str` or `list[str]`): Strings that terminate generation when encountered

<Accordion title="llama-cpp-python example">
```python
from llama_cpp import Llama

llm = Llama(
model_path="lfm2.5-1.2b-instruct-q4_k_m.gguf",
n_ctx=4096,
n_threads=8
)

# Text generation with sampling parameters
output = llm(
"What is machine learning?",
max_tokens=512,
temperature=0.7,
top_p=0.9,
top_k=40,
repeat_penalty=1.1,
stop=["<|im_end|>", "<|endoftext|>"]
)
print(output["choices"][0]["text"])

# Chat completion with sampling parameters
response = llm.create_chat_completion(
messages=[
{"role": "user", "content": "Explain quantum computing."}
],
temperature=0.7,
top_p=0.9,
top_k=40,
max_tokens=512,
repeat_penalty=1.1
)
print(response["choices"][0]["message"]["content"])
```
</Accordion>

<Accordion title="llama-server (OpenAI-compatible API) example">
```python
from openai import OpenAI
Expand Down Expand Up @@ -407,38 +334,6 @@ hf download LiquidAI/LFM2-VL-1.6B-GGUF mmproj-LFM2-VL-1.6B-Q8_0.gguf --local-dir
```
</Accordion>

<Accordion title="Using llama-cpp-python">
```python
from llama_cpp import Llama
from llama_cpp.llama_chat_format import Llava15ChatHandler

# Initialize with vision support
# Note: Use the correct chat handler for your model architecture
chat_handler = Llava15ChatHandler(clip_model_path="mmproj-model-f16.gguf")

llm = Llama(
model_path="lfm2.5-vl-1.6b-q4_k_m.gguf",
chat_handler=chat_handler,
n_ctx=4096
)

# Generate with image
response = llm.create_chat_completion(
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "file:///path/to/image.jpg"}},
{"type": "text", "text": "Describe this image."}
]
}
],
max_tokens=256
)
print(response["choices"][0]["message"]["content"])
```
</Accordion>

<Info>
For a complete working example with step-by-step instructions, see the [llama.cpp Vision Model Colab notebook](https://colab.research.google.com/drive/1q2PjE6O_AahakRlkTNJGYL32MsdUcj7b?usp=sharing).
</Info>
Expand Down
7 changes: 6 additions & 1 deletion docs/inference/transformers.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,14 @@ Transformers provides the most flexibility for model development and is ideal fo
Install the required dependencies:

```bash
pip install transformers>=4.57.1 torch>=2.6
pip install "transformers>=5.0.0" torch
```

> **Note:** Transformers v5 is newly released. If you encounter issues, fall back to the pinned git source:
> ```bash
> pip install git+https://github.com/huggingface/transformers.git@0c9a72e4576fe4c84077f066e585129c97bfd4e6 torch
> ```

GPU is recommended for faster inference.

## Basic Usage
Expand Down
7 changes: 7 additions & 0 deletions docs/inference/vllm.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -185,9 +185,16 @@ To use LFM Vision Models with vLLM, install the precompiled wheel along with the
VLLM_PRECOMPILED_WHEEL_COMMIT=72506c98349d6bcd32b4e33eec7b5513453c1502 VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/vllm-project/vllm.git
```

```bash
pip install "transformers>=5.0.0" pillow
```

<Note>
Transformers v5 is newly released. If you encounter issues, fall back to the pinned git source:
```bash
pip install git+https://github.com/huggingface/transformers.git@3c2517727ce28a30f5044e01663ee204deb1cdbe pillow
```
</Note>

This installs vLLM with the necessary changes for LFM Vision Model support. Once these changes are merged upstream, you'll be able to use the standard vLLM installation.

Expand Down
Loading