Liquid4All · alay2shah · Jan 28, 2026 · Jan 27, 2026 · Jan 27, 2026 · Jan 27, 2026
@@ -36,7 +36,7 @@
   "logo": {
     "light": "/logo/light.svg",
     "dark": "/logo/dark.svg",
-    "href": "/"
+    "href": "https://liquid.ai"
   },
   "navbar": {
     "links": [
@@ -52,20 +52,6 @@
     }
   },
   "navigation": {
-    "global": {
-      "anchors": [
-        {
-          "anchor": "About Us",
-          "icon": "building",
-          "href": "https://www.liquid.ai/company/about"
-        },
-        {
-          "anchor": "Blog",
-          "icon": "pencil",
-          "href": "https://www.liquid.ai/company/blog"
-        }
-      ]
-    },
     "tabs": [
       {
         "tab": "Documentation",
@@ -202,7 +188,7 @@
         ]
       },
       {
-        "tab": "Guides",
+        "tab": "Examples",
         "groups": [
           {
             "group": "Get Started",

@@ -20,10 +20,10 @@ LEAP Finetune will provide:
 While LEAP Finetune is in development, you can fine-tune models using:
 
 <CardGroup cols={2}>
-  <Card title="TRL" icon="graduation-cap" href="/lfm/fine-tuning/trl">
+  <Card title="TRL" icon="graduation-cap" href="/docs/fine-tuning/trl">
     Hugging Face's training library with LoRA/QLoRA support
   </Card>
-  <Card title="Unsloth" icon="zap" href="/lfm/fine-tuning/unsloth">
+  <Card title="Unsloth" icon="zap" href="/docs/fine-tuning/unsloth">
     Memory-efficient fine-tuning with 2x faster training
   </Card>
 </CardGroup>

@@ -19,7 +19,7 @@ pip install outlines transformers torch
 
 ## Setup[](#setup "Direct link to Setup")
 
-Outlines provides a simple interface for constrained generation. The examples below use Transformers, but Outlines works with all major inference frameworks including [vLLM](/lfm/inference/vllm), [llama.cpp](/lfm/inference/llama-cpp), [MLX](/lfm/inference/mlx), [Ollama](/lfm/inference/ollama), and more. See the [Outlines documentation](https://dottxt-ai.github.io/outlines/latest/) for framework-specific examples.
+Outlines provides a simple interface for constrained generation. The examples below use Transformers, but Outlines works with all major inference frameworks including [vLLM](/docs/inference/vllm), [llama.cpp](/docs/inference/llama-cpp), [MLX](/docs/inference/mlx), [Ollama](/docs/inference/ollama), and more. See the [Outlines documentation](https://dottxt-ai.github.io/outlines/latest/) for framework-specific examples.
 
 Start by wrapping your model:
 
@@ -263,5 +263,3 @@ For a detailed example of using Outlines with LFM2-350M for smart home control,
 * [Outlines GitHub](https://github.com/dottxt-ai/outlines)
 * [Outlines Documentation](https://dottxt-ai.github.io/outlines/)
 * [LFM2 × .txt Collaboration Blog Post](https://www.liquid.ai/blog/liquid-txt-collaboration)
-
-[Edit this page](https://github.com/Liquid4All/docs/tree/main/lfm/frameworks/outlines.md)
@@ -102,8 +102,8 @@ Use Mintlify components appropriately:
 
 ### Links
 
-- Use relative links for internal pages: `/lfm/inference/transformers`
-- Use descriptive link text: "See the [inference guide](/lfm/inference/transformers)" not "Click [here](/lfm/inference/transformers)"
+- Use relative links for internal pages: `/docs/inference/transformers`
+- Use descriptive link text: "See the [inference guide](/docs/inference/transformers)" not "Click [here](/docs/inference/transformers)"
 
 ## What to Contribute
 

@@ -15,11 +15,11 @@ All LFM models support a 32k token text context length for extended conversation
 
 <Accordion title="Which inference frameworks are supported?">
 LFM models are compatible with:
-- [Transformers](/lfm/inference/transformers) - For research and development
-- [llama.cpp](/lfm/inference/llama-cpp) - For efficient CPU inference
-- [vLLM](/lfm/inference/vllm) - For high-throughput production serving
-- [MLX](/lfm/inference/mlx) - For Apple Silicon optimization
-- [Ollama](/lfm/inference/ollama) - For easy local deployment
+- [Transformers](/docs/inference/transformers) - For research and development
+- [llama.cpp](/docs/inference/llama-cpp) - For efficient CPU inference
+- [vLLM](/docs/inference/vllm) - For high-throughput production serving
+- [MLX](/docs/inference/mlx) - For Apple Silicon optimization
+- [Ollama](/docs/inference/ollama) - For easy local deployment
 - [LEAP](/leap/index) - For edge and mobile deployment
 </Accordion>
 
@@ -39,7 +39,7 @@ LFM2.5 models are updated versions with improved training that deliver higher pe
 </Accordion>
 
 <Accordion title="What are Liquid Nanos?">
-[Liquid Nanos](/lfm/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like:
+[Liquid Nanos](/docs/models/liquid-nanos) are task-specific models fine-tuned for specialized use cases like:
 - Information extraction (LFM2-Extract)
 - Translation (LFM2-350M-ENJP-MT)
 - RAG question answering (LFM2-1.2B-RAG)
@@ -69,7 +69,7 @@ For most use cases, Q4_K_M or Q5_K_M provide good quality with significant size
 ## Fine-tuning
 
 <Accordion title="Can I fine-tune LFM models?">
-Yes! Most LFM models support fine-tuning with [TRL](/lfm/fine-tuning/trl) and [Unsloth](/lfm/fine-tuning/unsloth). Check the [Complete Model Library](/lfm/models/complete-library) for trainability information.
+Yes! Most LFM models support fine-tuning with [TRL](/docs/fine-tuning/trl) and [Unsloth](/docs/fine-tuning/unsloth). Check the [Model Library](/docs/models/complete-library) for trainability information.
 </Accordion>
 
 <Accordion title="What fine-tuning methods are supported?">
@@ -82,4 +82,4 @@ Yes! Most LFM models support fine-tuning with [TRL](/lfm/fine-tuning/trl) and [U
 
 - Join our [Discord community](https://discord.gg/DFU3WQeaYD) for real-time help
 - Check the [Cookbook](https://github.com/Liquid4All/cookbook) for examples
-- See [Troubleshooting](/lfm/help/troubleshooting) for common issues
+- See [Troubleshooting](/docs/help/troubleshooting) for common issues
@@ -3,6 +3,6 @@ title: "LFM Documentation"
 description: "Redirect to LFM Getting Started"
 ---
 
-<meta http-equiv="refresh" content="0; url=/lfm/getting-started/intro" />
+<meta http-equiv="refresh" content="0; url=/docs/getting-started/welcome" />
 
-Redirecting to [Getting Started](/lfm/getting-started/intro)...
+Redirecting to [Getting Started](/docs/getting-started/welcome)...
@@ -114,67 +114,25 @@ hf download LiquidAI/LFM2.5-1.2B-Instruct-GGUF lfm2.5-1.2b-instruct-q4_k_m.gguf
 
 ## Basic Usage
 
-llama.cpp offers three main interfaces for running inference: `llama-cpp-python` (Python bindings), `llama-server` (OpenAI-compatible server), and `llama-cli` (interactive CLI).
+llama.cpp offers two main interfaces for running inference: `llama-server` (OpenAI-compatible server) and `llama-cli` (interactive CLI).
 
 <Tabs>
-  <Tab title="llama-cpp-python">
-    For Python applications, use the `llama-cpp-python` package.
-
-    **Installation:**
-    ```bash
-    pip install llama-cpp-python
-    ```
-
-    For GPU support:
-    ```bash
-    CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python
-    ```
-
-    **Model Setup:**
-    ```python
-    from llama_cpp import Llama
-
-    # Load model
-    llm = Llama(
-        model_path="lfm2.5-1.2b-instruct-q4_k_m.gguf",
-        n_ctx=4096,
-        n_threads=8
-    )
-
-    # Generate text
-    output = llm(
-        "What is artificial intelligence?",
-        max_tokens=512,
-        temperature=0.7,
-        top_p=0.9
-    )
-    print(output["choices"][0]["text"])
-    ```
-
-    **Chat Completions:**
-    ```python
-    response = llm.create_chat_completion(
-        messages=[
-            {"role": "system", "content": "You are a helpful assistant."},
-            {"role": "user", "content": "Explain quantum computing."}
-        ],
-        temperature=0.7,
-        max_tokens=512
-    )
-    print(response["choices"][0]["message"]["content"])
-    ```
-  </Tab>
-
   <Tab title="llama-server">
     llama-server provides an OpenAI-compatible API for serving models locally.
 
     **Starting the Server:**
     ```bash
+    llama-server -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF -c 4096 --port 8080
+    ```
+
+    The `-hf` flag downloads the model directly from Hugging Face. Alternatively, use a local model file:
+    ```bash
     llama-server -m lfm2.5-1.2b-instruct-q4_k_m.gguf -c 4096 --port 8080
     ```
 
     Key parameters:
-    * `-m`: Path to GGUF model file
+    * `-hf`: Hugging Face model ID (downloads automatically)
+    * `-m`: Path to local GGUF model file
     * `-c`: Context length (default: 4096)
     * `--port`: Server port (default: 8080)
     * `-ngl 99`: Offload layers to GPU (if available)
@@ -216,12 +174,18 @@ llama.cpp offers three main interfaces for running inference: `llama-cpp-python`
   <Tab title="llama-cli">
     llama-cli provides an interactive terminal interface for chatting with models.
 
+    ```bash
+    llama-cli -hf LiquidAI/LFM2.5-1.2B-Instruct-GGUF -c 4096 --color -i
+    ```
+
+    The `-hf` flag downloads the model directly from Hugging Face. Alternatively, use a local model file:
     ```bash
     llama-cli -m lfm2.5-1.2b-instruct-q4_k_m.gguf -c 4096 --color -i
     ```
 
     Key parameters:
-    * `-m`: Path to GGUF model file
+    * `-hf`: Hugging Face model ID (downloads automatically)
+    * `-m`: Path to local GGUF model file
     * `-c`: Context length
     * `--color`: Colored output
     * `-i`: Interactive mode
@@ -242,43 +206,6 @@ Control text generation behavior using parameters in the OpenAI-compatible API o
 * **`repetition_penalty`** / **`--repeat-penalty`** (`float`, default 1.1): Penalty for repeating tokens (>1.0 = discourage repetition). Typical range: 1.0-1.5
 * **`stop`** (`str` or `list[str]`): Strings that terminate generation when encountered
 
-<Accordion title="llama-cpp-python example">
-  ```python
-  from llama_cpp import Llama
-
-  llm = Llama(
-      model_path="lfm2.5-1.2b-instruct-q4_k_m.gguf",
-      n_ctx=4096,
-      n_threads=8
-  )
-
-  # Text generation with sampling parameters
-  output = llm(
-      "What is machine learning?",
-      max_tokens=512,
-      temperature=0.7,
-      top_p=0.9,
-      top_k=40,
-      repeat_penalty=1.1,
-      stop=["<|im_end|>", "<|endoftext|>"]
-  )
-  print(output["choices"][0]["text"])
-
-  # Chat completion with sampling parameters
-  response = llm.create_chat_completion(
-      messages=[
-          {"role": "user", "content": "Explain quantum computing."}
-      ],
-      temperature=0.7,
-      top_p=0.9,
-      top_k=40,
-      max_tokens=512,
-      repeat_penalty=1.1
-  )
-  print(response["choices"][0]["message"]["content"])
-  ```
-</Accordion>
-
 <Accordion title="llama-server (OpenAI-compatible API) example">
   ```python
   from openai import OpenAI
@@ -407,38 +334,6 @@ hf download LiquidAI/LFM2-VL-1.6B-GGUF mmproj-LFM2-VL-1.6B-Q8_0.gguf --local-dir
   ```
 </Accordion>
 
-<Accordion title="Using llama-cpp-python">
-  ```python
-  from llama_cpp import Llama
-  from llama_cpp.llama_chat_format import Llava15ChatHandler
-
-  # Initialize with vision support
-  # Note: Use the correct chat handler for your model architecture
-  chat_handler = Llava15ChatHandler(clip_model_path="mmproj-model-f16.gguf")
-
-  llm = Llama(
-      model_path="lfm2.5-vl-1.6b-q4_k_m.gguf",
-      chat_handler=chat_handler,
-      n_ctx=4096
-  )
-
-  # Generate with image
-  response = llm.create_chat_completion(
-      messages=[
-          {
-              "role": "user",
-              "content": [
-                  {"type": "image_url", "image_url": {"url": "file:///path/to/image.jpg"}},
-                  {"type": "text", "text": "Describe this image."}
-              ]
-          }
-      ],
-      max_tokens=256
-  )
-  print(response["choices"][0]["message"]["content"])
-  ```
-</Accordion>
-
 <Info>
 For a complete working example with step-by-step instructions, see the [llama.cpp Vision Model Colab notebook](https://colab.research.google.com/drive/1q2PjE6O_AahakRlkTNJGYL32MsdUcj7b?usp=sharing).
 </Info>

@@ -14,9 +14,14 @@ Transformers provides the most flexibility for model development and is ideal fo
 Install the required dependencies:
 
 ```bash
-pip install transformers>=4.57.1 torch>=2.6
+pip install "transformers>=5.0.0" torch
 ```
 
+> **Note:** Transformers v5 is newly released. If you encounter issues, fall back to the pinned git source:
+> ```bash
+> pip install git+https://github.com/huggingface/transformers.git@0c9a72e4576fe4c84077f066e585129c97bfd4e6 torch
+> ```
+
 GPU is recommended for faster inference.
 
 ## Basic Usage

@@ -185,9 +185,16 @@ To use LFM Vision Models with vLLM, install the precompiled wheel along with the
 VLLM_PRECOMPILED_WHEEL_COMMIT=72506c98349d6bcd32b4e33eec7b5513453c1502 VLLM_USE_PRECOMPILED=1 pip install git+https://github.com/vllm-project/vllm.git
 ```
 
+```bash
+pip install "transformers>=5.0.0" pillow
+```
+
+<Note>
+Transformers v5 is newly released. If you encounter issues, fall back to the pinned git source:
 ```bash
 pip install git+https://github.com/huggingface/transformers.git@3c2517727ce28a30f5044e01663ee204deb1cdbe pillow
 ```
+</Note>
 
 This installs vLLM with the necessary changes for LFM Vision Model support. Once these changes are merged upstream, you'll be able to use the standard vLLM installation.