diff --git a/docs/en/how_to/best-practices.mdx b/docs/en/how_to/best-practices.mdx
new file mode 100644
index 0000000..74aa922
--- /dev/null
+++ b/docs/en/how_to/best-practices.mdx
@@ -0,0 +1,833 @@
+---
+weight: 5
+title: Best Practices
+---
+
+# Best Practices
+
+## Overview
+
+As the de facto standard for caching and key-value storage in cloud-native architectures, Redis handles core requirements for high-concurrency read/write operations and low latency. Running stateful Redis services in a Kubernetes containerized environment presents challenges distinct from traditional physical machine environments, including **persistence stability**, **dynamic network topology changes**, and **resource isolation and scheduling**.
+
+This Best Practices document aims to provide a standardized reference guide for Redis deployments in production environments. It covers the full lifecycle management from **architecture selection**, **resource planning**, **client integration** to **observability and operations**. By following this guide, users can build an enterprise-class Redis data service that is **High Availability (HA)**, **High Performance**, and **Maintainability**.
+
+## Architecture Selection
+
+The Full Stack Cloud Native Open Platform offers two standard Redis management architectures based on customer business scale and SLA requirements:
+
+### Sentinel Mode
+
+**Positioning: Classic High Availability Architecture, suitable for small to medium-scale businesses.**
+
+Sentinel mode is based on Redis's native master-replica replication mechanism. By deploying independent Sentinel process groups to monitor the status of master and replica nodes, it automatically executes Failover and notifies clients when the master node fails.
+*   **Pros**: Simple architecture, mature operations, lower requirements for client protocols.
+*   **Cons**: Write capacity is limited to a single node; storage capacity cannot scale horizontally.
+
+### Cluster Mode
+
+**Positioning: Distributed Sharding Architecture, suitable for large-scale high-concurrency businesses.**
+
+Cluster mode automatically shards data across multiple nodes using Hash Slots, enabling horizontal scaling (Scale-out) of storage capacity and read/write performance.
+*   **Pros**: True high availability distributed storage, supports dynamic Resharding.
+*   **Cons**: Complex client protocol; specific multi-key commands (e.g., `MGET`) are restricted by Slot distribution.
+
+### Selection Guide
+
+When selecting a Redis architecture, consider business requirements for availability, scalability, and complexity.
+
+| Feature | Sentinel Mode | Cluster Mode |
+| :--- | :--- | :--- |
+| **Scenarios** | Small/Medium business, Read-heavy/Write-light, moderate data | Large business, High concurrency R/W, massive data |
+| **High Availability** | Via Sentinel monitoring and auto-failover | Via node auto-failure detection and recovery |
+| **Scalability** | Vertical (Scale-up), Horizontal (Read-only) | Horizontal (R/W), supports dynamic resharding |
+| **Read/Write Separation** | Supported (Client support required) | Supported (Usually direct connection to shard master, client support required) |
+| **Data Sharding** | None (Single node stores full data) | Yes (Data auto-sharded across multiple nodes) |
+| **Ops Complexity** | Lower, simple architecture | Higher, involves sharding, hash slots, migration |
+| **Network Constraints** | Requires client support for Sentinel protocol | Requires client support for Cluster protocol |
+
+**Recommendations:**
+*   If data volume is small (fits in single node memory) and simplicity/stability is priority, **Sentinel Mode** is preferred.
+*   If data volume is massive or write pressure is extremely high and cannot be supported by a single node, choose **Cluster Mode**.
+
+## Version Selection
+
+Alauda Cache Service for Redis OSS currently supports `5.0`, `6.0`, and `7.2` stable versions. All three versions have undergone complete automated testing and production verification.
+
+**For new deployments, we strongly recommend choosing Redis `7.2`:**
+
+1.  **Lifecycle**
+    *   **`5.0` / `6.0`**: Community versions are End of Life (EOL) and no longer receive new features or security patches. Recommended only for compatibility with legacy applications.
+    *   **`7.2`**: As the current Long Term Support (LTS) version, it has the longest lifecycle, ensuring operational stability and security updates for years to come.
+
+2.  **Compatibility**
+    *   Redis `7.2` maintains high compatibility with `5.0` and `6.0` data commands. Most business code can migrate smoothly without modification.
+    *   *Note*: RDB persistence file format (v11) is not backward compatible (i.e., RDB generated by `7.2` cannot be loaded by `6.0`), but this does not affect new services.
+
+3.  **Key Features**
+    *   **ACL v2**: Provides granular access control (Key-based permission selectors), significantly enhancing security in multi-tenant environments.
+    *   **Redis Functions**: Introduces Server-side Scripting standards, resolving issues with Lua script loss and replication, keeping logic closer to data.
+    *   **Sharded Pub/Sub**: Resolves network storm issues caused by Pub/Sub broadcasting in Cluster mode, significantly improving messaging scalability via sharding.
+    *   **Performance Optimization**: Deep optimizations in data structures (especially Sorted Sets) and memory management provide higher throughput and lower latency.
+
+> For more details on Redis 7.2 features, please refer to the official [Redis 7.2 Release Notes](https://github.com/redis/redis/blob/7.2/00-RELEASENOTES).
+
+## Resource Planning
+
+### Kernel Tuning
+
+To ensure stability and high performance in production, the following kernel parameter optimizations are recommended at the Kubernetes node level:
+
+1.  **Memory Allocation (`vm.overcommit_memory`)**
+    *   **Recommended**: `1`
+    *   **Explanation**: Setting to `1` (Always) ensures the kernel allows memory allocation during Redis Fork operations (RDB snapshot/AOF rewrite), even if physical memory appears insufficient. This effectively prevents persistence failures due to allocation errors.
+
+2.  **Connection Queue (`net.core.somaxconn`)**
+    *   **Recommended**: `2048` or higher
+    *   **Explanation**: Redis default tcp-backlog is 511. In high concurrency scenarios, system `net.core.somaxconn` should be increased to avoid dropping client connection requests.
+
+3.  **Transparent Huge Pages (THP)**
+    *   **Action**: **Disable** (`never`)
+    *   **Explanation**: THP causes significant latency spikes during memory allocation in Redis, especially during Copy-on-Write (CoW) after Fork. It is recommended to disable this on the host or via startup scripts.
+
+### Memory Specifications
+
+Redis uses a snapshot mechanism to asynchronously replicate in-memory data to disk for long-term storage. This keeps Redis high-performing but carries a risk of data loss between snapshots.
+
+In Kubernetes containerized environments, we recommend a tiered memory management strategy:
+*   **✅ Standard Specs (< 8GB)**: **Strongly Recommended**. Ensures extremely low Fork latency and fast failure recovery (RTO < 60s); the most robust production choice.
+*   **⚠️ High-Performance Specs (8GB - 16GB)**: **Acceptable**. Requires high-performance host and **THP must be disabled**. Fork is controllable but may cause ~100ms jitter under high load.
+*   **❌ High-Risk Specs (> 16GB)**: **Not Recommended**. Single point of failure impact is too large, and full synchronization can easily saturate network bandwidth. Recommend horizontal splitting into Cluster mode.
+
+#### Why Limit to 8GB?
+While single instances on physical machines often run 32GB+, the 8GB limit in cloud-native environments is based on the "Golden Rule" of these core technologies:
+
+1.  **Fork Blocking & Page Table Copy**
+    *   Redis calls `fork()` during RDB/AOF Rewrite. Although memory pages are CoW, **Process Page Tables must be fully copied**, blocking the **main thread**.
+    *   *Estimation*: 10GB memory ≈ 20MB page table ≈ 10~50ms blocking (depending on virtualization overhead). Exceeding 8GB increases blocking risk exponentially, impacting SLA.
+
+2.  **Failure Recovery Efficiency (RTO)**
+    *   Container restart loading RDB is a **single-threaded CPU-bound task** (object deserialization). Tests show loading 8GB data takes **30-50s** (even with SSD). Maintaining 32GB could result in multi-minute start times, contradicting K8s "fast self-healing" philosophy.
+
+#### Memory Configuration Best Practices
+
+To avoid OOM (OOM Kill) during persistence due to memory expansion, strict adherence to these principles is required:
+
+1.  **Set MaxMemory**: Do not set `maxmemory` to 100% of the container Memory Limit. Recommend setting to **70% ~ 80%** of the Limit.
+2.  **Reserve CoW Space**: Redis Forks a child process during RDB/AOF Rewrite. If there are heavy write updates, OS Copy-on-Write mechanisms duplicate memory pages; in extreme cases, memory usage can double from 8GB to 16GB.
+3.  **Overcommit Config**: Ensure host `vm.overcommit_memory = 1` to allow kernel forks without requesting equivalent physical memory (relying on CoW), preventing fork failures.
+
+> [!INFO]
+>
+> **Resource Reservation Formula**: `Container_Memory_Limit` ≈ `Redis_MaxMemory` / 0.7
+> * Example: To store 8GB data, configure Container Memory Limit to 10GB ~ 12GB, leaving 2GB+ for CoW and fragmentation overhead.
+
+### CPU Resources
+
+Redis core command execution is single-threaded, but persistence (Fork) and other operations require child processes. Therefore, allocate **at least 2 Cores** per Redis instance:
+*   **Core 1**: Handles main thread requests and commands.
+*   **Core 2**: Handles persistence fork, background tasks, and system overhead.
+
+#### Multi-threading
+
+Redis 6.0+ introduced multi-threaded I/O (disabled by default) to overcome single-thread network I/O bottlenecks.
+
+*   **When to Enable?**
+    *   **Bottleneck Analysis**: When Redis CPU usage nears 100% and analysis shows time spent on Kernel State Network I/O (System CPU) rather than user-space command execution.
+    *   **Traffic Profile**: Typically beneficial when single instance QPS > 80,000 or network traffic is huge (> 1GB/s).
+    *   **Resource Conditions**: Ensure node has sufficient CPU cores (at least 4 cores).
+
+*   **Configuration Best Practices**:
+    *   **Thread Count**: Recommend 4~8 I/O threads. Exceeding 8 threads rarely yields significant gain.
+    *   **Config Example**:
+        ```yaml
+        io-threads 4
+        io-threads-do-reads yes
+        ```
+    *   **Note**: Multi-threaded I/O only improves network throughput; it **does NOT** improve execution speed of single complex commands (e.g., `SORT`, `KEYS`).
+
+### Storage Planning
+
+#### Capacity Planning
+Persistence mode directly determines disk quota requirements. Refer to the following calculation formula:
+
+| Mode | Recommended Quota Formula | Details |
+| :--- | :--- | :--- |
+| **Diskless (Cache)** | `0` (No PVC) | Used as pure cache, no RDB/AOF. Logs collected via stdout in K8s, no persistence disk needed. |
+| **RDB (Snapshot)** | `MaxMemory * 2` | RDB uses CoW. During snapshot generation, both "old snapshot" and "new snapshot being written" exist on disk.<br />**Recommendation**: Reserve at least 2x memory space. |
+| **AOF (Append Only)** | `MaxMemory * 3` | AOF grows with write operations. Default config (`auto-aof-rewrite-percentage 100`) triggers rewrite when AOF reaches **2x** data size. Disk must hold:<br />1. Old AOF file (2x)<br />2. New AOF file from rewrite (1x)<br />**Peak total 3x**. Recommend reserving at least 3x space. |
+
+#### Performance Requirements
+*   **With AOF**: Disk performance is critical. Insufficient IOPS or high fsync latency will directly block the main thread (when `appendfsync everysec`).
+*   **Media**: Production environments strongly recommend SSD/NVMe local disks or high-performance cloud disks.
+
+### Parameter Configuration
+
+Alauda Cache Service for Redis OSS parameters are specified via Custom Resource (CR) fields.
+
+#### Built-in Templates
+
+Alauda Cache Service for Redis OSS provides multiple parameter templates for different business scenarios. Selection depends on the trade-off between persistence (Diskless/AOF/RDB) and performance.
+
+| Template Name | Description | Scenarios | Risks |
+| :--- | :--- | :--- | :--- |
+| **rdb-redis-&lt;version&gt;-&lt;sentinel\|cluster&gt;** | Enables RDB persistence, periodic snapshots to disk. | **Balanced**: Limited resources, balances performance/reliability, accepts minute-level data loss. | Data loss depends on `save` config, usually minute-level RPO. |
+| **aof-redis-&lt;version&gt;-&lt;sentinel\|cluster&gt;** | Enables AOF persistence, logs every write op. | **Secure**: Ample resources, high data security (second-level loss), slight performance compromise. | Frequent fsync requires high-performance storage, high IO pressure. |
+| **diskless-redis-&lt;version&gt;-&lt;sentinel\|cluster&gt;** | Disables persistence, pure in-memory. | **High-Perf Cache**: Acceleration only, data loss acceptable or rebuildable from source. | Restart or failure leads to **full data loss**. |
+
+> `<version>` represents Redis version, e.g., `6.0`, `7.2`.
+
+Key parameter differences:
+
+| Parameter | RDB Template | AOF Template | Diskless Template | Explanation |
+| :--- | :--- | :--- | :--- | :--- |
+| `appendonly` | `no` | `yes` | `no` | Enable AOF logging. |
+| `save` | `60 10000 300 100 600 1` | `""` (Disabled) | `""` (Disabled) | RDB snapshot triggers. |
+| `repl-diskless-sync` | `no` | `no` | `yes` | Master-replica full sync via socket without disk. |
+| `repl-diskless-sync-delay` | `5` | `5` | `0` | Delay for diskless sync; 0 for Diskless to speed up sync. |
+
+##### Persistence Selection Recommendations
+
+1.  **Pure Cache**: Choose **Diskless Template**. Data rebuildable, no overhead, best performance.
+2.  **General Business**: Choose **RDB Template**. Periodic snapshots provide minute-level RPO, moderate resource usage.
+3.  **Financial/High-Reliability**: Choose **AOF Template** with `appendfsync everysec` for second-level protection.
+
+> [!WARNING]
+> **Should RDB + AOF be enabled simultaneously?**
+>
+> Redis supports running RDB and AOF together, but it is **generally not recommended** in Kubernetes:
+> *   **Performance**: AOF fsync creates IO pressure; adding RDB fork + disk write significantly increases resource contention.
+> *   **Storage Doubling**: Requires space for both RDB snapshots and AOF files, complicating PVC planning.
+> *   **Recovery Priority**: Redis loads AOF first on start (more complete data); RDB acts only as backup, offering limited benefit.
+> *   **Platform Backup**: Alauda Cache Service for Redis OSS provides independent auto/manual backup, removing reliance on RDB snapshots for extra insurance.
+>
+> **Recommendation**: Choose **Single Persistence Mode** (RDB or AOF) based on needs, and use platform backup for disaster recovery. If mixed mode is necessary, ensure sufficient Storage IOPS (SSD) and reserve 5x data volume disk space.
+
+#### Parameter Update
+
+Redis parameters are categorized by application method:
+
+| Category | Parameters | Behavior |
+| :--- | :--- | :--- |
+| **Hot Update** | Most runtime params (`maxmemory`, `loglevel`, etc.) | **Immediate effect** after modification, no restart. |
+| **Restart Update** | `databases`, `rename-command`, `rdbchecksum`, `tcp-backlog`, `io-threads`, `io-threads-do-reads` | Requires **Instance Restart** to take effect. |
+| **Immutable** | `bind`, `protected-mode`, `port`, `supervised`, `pidfile`, `dir`, etc. | Managed by system, modification may cause anomalies. |
+
+> [!TIP]
+> Always assume data backup before modifying parameters requiring restart.
+
+#### Modification Examples
+
+**Update Data Node Parameters**: Configure via `spec.customConfig`.
+
+```bash
+# Example: Modify save strategy (Hot update)
+kubectl -n <namespace> patch redis <instance-name> --type=merge --patch='{"spec": {"customConfig": {"save":"600 1"}}}'
+```
+
+**Update Sentinel Node Parameters**: Configure via `spec.sentinel.monitorConfig`.
+> Currently supports `down-after-milliseconds`, `failover-timeout`, `parallel-syncs`.
+
+```bash
+# Example: Modify failover timeout
+kubectl -n <namespace> patch redis <instance-name> --type=merge --patch='{"spec": {"sentinel": {"monitorConfig": {"down-after-milliseconds":"30000"}}}}'
+```
+
+### Resource Specs
+
+Deploy resources according to your actual business scenario.
+
+#### Sentinel Mode Specs
+
+<table border="1">
+  <thead>
+    <tr>
+      <th>Persistence</th>
+      <th>Template</th>
+      <th>Instance Spec</th>
+      <th>Replica / Sentinel</th>
+      <th>Sentinel Pod</th>
+      <th>redis-exporter</th>
+      <th>redis (Spec)</th>
+      <th>Backup Pod</th>
+      <th>Total Resources</th>
+      <th>Storage Quota</th>
+      <th>Auto Backup (Keep 7)</th>
+      <th>Manual Backup (Keep 7)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td rowspan="2">AOF</td>
+      <td>aof-redis-&lt;version&gt;-sentinel</td>
+      <td>2c4g</td>
+      <td rowspan="6">1 / 3</td>
+      <td rowspan="6">100m128Mi</td>
+      <td rowspan="6">100m200Mi</td>
+      <td>2c4g</td>
+      <td rowspan="6">Unlimited (Reserve resources)</td>
+      <td>4.5c4.8G</td>
+      <td rowspan="2" colspan="3">Evaluate based on actual write volume</td>
+    </tr>
+    <tr>
+      <td>aof-redis-&lt;version&gt;-sentinel</td>
+      <td>4c8g</td>
+      <td>4c8g</td>
+      <td>8.5c8.8G</td>
+    </tr>
+    <tr>
+      <td rowspan="2">RDB</td>
+      <td>rdb-redis-&lt;version&gt;-sentinel</td>
+      <td>2c4g</td>
+      <td>2c4g</td>
+      <td>4.5c4.8G</td>
+      <td>8G</td>
+      <td>28G</td>
+      <td>28G</td>
+    </tr>
+    <tr>
+      <td>rdb-redis-&lt;version&gt;-sentinel</td>
+      <td>4c8g</td>
+      <td>4c8g</td>
+      <td>8.5c8.8G</td>
+      <td>16G</td>
+      <td>56G</td>
+      <td>56G</td>
+    </tr>
+    <tr>
+      <td rowspan="2">Diskless</td>
+      <td>diskless-redis-&lt;version&gt;-sentinel</td>
+      <td>2c4g</td>
+      <td>2c4g</td>
+      <td>4.5c4.8G</td>
+      <td rowspan="2">/</td>
+      <td>28G</td>
+      <td>28G</td>
+    </tr>
+    <tr>
+      <td>diskless-redis-&lt;version&gt;-sentinel</td>
+      <td>4c8g</td>
+      <td>4c8g</td>
+      <td>8.5c8.8G</td>
+      <td>56G</td>
+      <td>56G</td>
+    </tr>
+  </tbody>
+</table>
+
+#### Cluster Mode Specs
+
+<table border="1">
+  <thead>
+    <tr>
+      <th>Persistence</th>
+      <th>Template</th>
+      <th>Instance Spec</th>
+      <th>Sharding / Replica</th>
+      <th>redis-exporter</th>
+      <th>redis (Spec)</th>
+      <th>Backup Pod</th>
+      <th>Total Resources</th>
+      <th>Storage Quota</th>
+      <th>Auto Backup (Keep 7)</th>
+      <th>Manual Backup (Keep 7)</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td rowspan="2">AOF</td>
+      <td>aof-redis-&lt;version&gt;-cluster</td>
+      <td>2c4g</td>
+      <td rowspan="6">3 / 1</td>
+      <td rowspan="6">100m300Mi</td>
+      <td>2c4g</td>
+      <td rowspan="6">Unlimited (Reserve resources)</td>
+      <td>12.6c25.8G</td>
+      <td rowspan="2" colspan="3">Evaluate based on actual write volume</td>
+    </tr>
+    <tr>
+      <td>aof-redis-&lt;version&gt;-cluster</td>
+      <td>4c8g</td>
+      <td>4c8g</td>
+      <td>24.6c49.8G</td>
+    </tr>
+    <tr>
+      <td rowspan="2">RDB</td>
+      <td>rdb-redis-&lt;version&gt;-cluster</td>
+      <td>2c4g</td>
+      <td>2c4g</td>
+      <td>12.6c25.8G</td>
+      <td>24G</td>
+      <td>84G</td>
+      <td>84G</td>
+    </tr>
+    <tr>
+      <td>rdb-redis-&lt;version&gt;-cluster</td>
+      <td>4c8g</td>
+      <td>4c8g</td>
+      <td>24.6c49.8G</td>
+      <td>48G</td>
+      <td>168G</td>
+      <td>168G</td>
+    </tr>
+    <tr>
+      <td rowspan="2">Diskless</td>
+      <td>diskless-redis-&lt;version&gt;-cluster</td>
+      <td>2c4g</td>
+      <td>2c4g</td>
+      <td>12.6c25.8G</td>
+      <td rowspan="2">/</td>
+      <td>84G</td>
+      <td>84G</td>
+    </tr>
+    <tr>
+      <td>diskless-redis-&lt;version&gt;-cluster</td>
+      <td>4c8g</td>
+      <td>4c8g</td>
+      <td>24.6c49.8G</td>
+      <td>168G</td>
+      <td>168G</td>
+    </tr>
+  </tbody>
+</table>
+
+> `<version>` represents Redis version, e.g., `6.0`, `7.2`.
+
+## Scheduling
+
+Alauda Cache Service for Redis OSS offers flexible scheduling strategies, supporting node selection, taint toleration, and various anti-affinity configurations to meet high availability needs in different resource environments.
+
+### Node Selection
+
+You can use the `spec.nodeSelector` field to specify which nodes Redis Pods should be scheduled on. This is typically used with Kubernetes Node Labels to isolate database workloads to dedicated node pools.
+
+> [!WARNING]
+> **Persistence Limitation**: If your Redis instance mounts **Non-Network Storage** (e.g., Local PV) PVCs, be cautious when updating `nodeSelector`. Since local data resides on specific nodes and cannot migrate with Pods, the updated `nodeSelector` set **MUST include the node where the Pod currently resides**. If the original node is excluded, the Pod will fail to access data or start. Network storage (Ceph RBD, NFS) follows the Pod and is not subject to this restriction.
+
+### Taint Toleration
+
+Use `spec.tolerations` to allow Redis Pods to tolerate node Taints. This allows deploying Redis on dedicated nodes with specific taints (e.g., `key=redis:NoSchedule`), preventing other non-critical workloads from preempting resources.
+
+### Anti-Affinity
+
+To prevent single points of failure, Alauda Cache Service for Redis OSS provides anti-affinity configuration. Configuration differs by architecture mode.
+
+> [!CAUTION]
+> **Immutable**: To ensure consistency and reliability, anti-affinity configurations (both `affinityPolicy` and `affinity`) **cannot be modified** after instance creation. Please plan ahead.
+
+#### Cluster Mode
+
+In Cluster mode, the system **prioritizes `spec.affinityPolicy`**. Alauda Cache Service for Redis OSS uses this enum to abstract complex topology rules, automatically generating affinity rules for each shard's StatefulSet.
+
+*   **Priority**: `spec.affinityPolicy` > `spec.affinity`.
+*   **If `affinityPolicy` is unset**: Alauda Cache Service for Redis OSS checks `spec.affinity`. If you need custom topology rules beyond the enums below, leave `affinityPolicy` empty and configure native `spec.affinity`.
+
+<table border="1">
+  <thead>
+    <tr>
+      <th>Policy Name</th>
+      <th>affinityPolicy Value</th>
+      <th>Behavior</th>
+      <th>Pros/Cons</th>
+      <th>Scenario</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td><strong>All Pods Forced Anti-Affinity</strong></td>
+      <td><code>AntiAffinity</code></td>
+      <td>Forces <strong>ALL Pods</strong> in the cluster (including primary/replicas of different shards) to be on different nodes. Fails if node count < total Pod count.</td>
+      <td>
+        <ul>
+          <li><strong>Pros</strong>: Highest disaster recovery, minimal single-node failure impact.</li>
+          <li><strong>Cons</strong>: extremely high resource requirement, Node count must be >= Total Pods.</li>
+        </ul>
+      </td>
+      <td><strong>Cluster Mode Core Business</strong><br />Ample resources, strict HA requirements.</td>
+    </tr>
+    <tr>
+      <td><strong>Shard Primary-Replica Forced Anti-Affinity</strong></td>
+      <td><code>AntiAffinityInSharding</code></td>
+      <td>Forces <strong>Primary and Replicas within same shard</strong> to be on different nodes. Pods from different shards can coexist.</td>
+      <td>
+        <ul>
+          <li><strong>Pros</strong>: Guarantees physical isolation of data replicas, preventing shard migration data loss.</li>
+          <li><strong>Cons</strong>: Scheduling fails if live nodes < replica count. Primaries of different shards might land on same node (single point of failure risk).</li>
+        </ul>
+      </td>
+      <td><strong>Production Standard</strong><br />Balances resource usage and data safety.</td>
+    </tr>
+    <tr>
+      <td><strong>Shard Primary-Replica Soft Anti-Affinity</strong></td>
+      <td><code>SoftAntiAffinity</code></td>
+      <td><strong>Prioritizes</strong> spreading shard primary/replicas. If impossible (e.g., insufficient nodes), <strong>allows</strong> scheduling on same node.</td>
+      <td>
+        <ul>
+          <li><strong>Pros</strong>: Highest deployment success rate, runs with limited resources.</li>
+          <li><strong>Cons</strong>: Primary/Replica may share node in extreme cases, risking data loss.</li>
+        </ul>
+      </td>
+      <td><strong>Test/Dev Environments</strong><br />Or resource-constrained edge environments.</td>
+    </tr>
+  </tbody>
+</table>
+
+#### Sentinel Mode
+
+> **Important**
+> **Sentinel Mode does not support `spec.affinityPolicy`**.
+
+For Sentinel mode, Redis Data Nodes and Sentinel Nodes require separate Kubernetes native Affinity rules:
+
+*   **Redis Data Nodes**: Configured via **`spec.affinity`**.
+*   **Sentinel Nodes**: Configured via **`spec.sentinel.affinity`**.
+
+You need to manually write complete `Affinity` rules. Example for forcing anti-affinity for both Data and Sentinel nodes:
+
+```yaml
+spec:
+  affinity:
+    podAntiAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+      - labelSelector:
+          matchExpressions:
+          - key: app.kubernetes.io/component
+            operator: In
+            values:
+            - redis
+          - key: redisfailovers.databases.spotahome.com/name
+            operator: In
+            values:
+            - <instance name>
+        topologyKey: kubernetes.io/hostname
+  sentinel:
+    affinity:
+      podAntiAffinity:
+        requiredDuringSchedulingIgnoredDuringExecution:
+        - labelSelector:
+            matchExpressions:
+            - key: app.kubernetes.io/component
+              operator: In
+              values:
+              - sentinel
+            - key: redissentinels.databases.spotahome.com/name
+              operator: In
+              values:
+              - <instance name>
+          topologyKey: kubernetes.io/hostname
+```
+
+To force anti-affinity across ALL nodes (Data + Sentinel), refer to:
+
+```yaml
+spec:
+  affinity:
+    podAntiAffinity:
+      requiredDuringSchedulingIgnoredDuringExecution:
+      - labelSelector:
+          matchExpressions:
+          - key: middleware.instance/type
+            operator: In
+            values:
+            - redis-failover
+          - key: middleware.instance/name
+            operator: In
+            values:
+            - <instance name>
+        topologyKey: kubernetes.io/hostname
+  sentinel:
+    affinity:
+      podAntiAffinity:
+        requiredDuringSchedulingIgnoredDuringExecution:
+        - labelSelector:
+            matchExpressions:
+            - key: middleware.instance/type
+              operator: In
+              values:
+              - redis-failover
+            - key: middleware.instance/name
+              operator: In
+              values:
+              - <instance name>
+          topologyKey: kubernetes.io/hostname
+```
+
+## User Management
+
+Alauda Cache Service for Redis OSS (v6.0+) provides declarative user management via `RedisUser` CRD, supporting ACLs.
+
+> [!TIP]
+> **Compatibility**: Redis 5.0 only supports single-user auth; Redis 6.0+ implements full ACLs for multi-user/granular control.
+
+### Permission Profiles
+
+The platform pre-defines permission profiles for common scenarios:
+
+| Profile | ACL Rule | Explanation |
+| :--- | :--- | :--- |
+| **NotDangerous** | `+@all -@dangerous ~*` | Allows all commands except dangerous ones (e.g., `FLUSHDB`). |
+| **ReadWrite** | `-@all +@write +@read -@dangerous ~*` | Allows read/write, blocks dangerous ops. |
+| **ReadOnly** | `-@all +@read -keys ~*` | Allows read-only operations. |
+| **Administrator** | `+@all -acl ~*` | Admin privileges, allows all commands except ACL management. |
+
+For custom ACLs, see [Redis ACL Documentation](https://redis.io/topics/acl).
+
+### Security Mechanisms
+
+1.  **ACL Force Revocation**: All `RedisUser` creation/updates undergo Webhook validation to **force remove `acl` permissions**, preventing privilege escalation.
+2.  **Cluster Command Injection**: For **Cluster Mode**, Alauda Cache Service for Redis OSS automatically injects topology commands: `cluster|slots`, `cluster|nodes`, `cluster|info`, `cluster|keyslot`, `cluster|getkeysinslot`, `cluster|countkeysinslot` to ensure client awareness.
+3.  **6.0 -> 7.2 Upgrade Compatibility**: When upgrading 6.0 -> 7.2, the operator adds `&*` (Pub/Sub Channel) permission to ensure consistency with 7.x's new Channel ACLs.
+
+### System Account
+
+Each Redis instance automatically generates a system account named `operator`. Its roles include:
+
+1.  **Cluster Init**: Slot assignment, node joining.
+2.  **Config Simplification**: Unified system account reduces user configuration complexity.
+3.  **Operations**: Used for health checks, failovers, scaling.
+4.  **Avoid Restarts**: Password updates for business users don't affect this account, avoiding restarts.
+
+> [!CAUTION]
+> `operator` is a **Reserved System Account**:
+> *   **Complexity**: Random 64-char string (alphanumeric+special).
+> *   **Privilege**: Highest level (includes user management).
+> *   **Restriction**: **No online password update** and **DO NOT manually modify/delete**, as it may cause irreversible failure.
+
+### Production Best Practices
+
+1.  **App Isolation**: Create **independent user accounts** for each app/microservice. Avoid sharing accounts to enable auditing and isolation.
+2.  **Principle of Least Privilege**:
+    *   **Read-Only App**: Use `ReadOnly`.
+    *   **Read-Write App**: Use `ReadWrite`.
+    *   **Ops Tools**: Use `NotDangerous` or custom permissions.
+    *   **Avoid `Administrator`**: Unless absolutely necessary.
+3.  **Key Namespace Isolation**: Combine ACL Key patterns (e.g., `~app1:*`) to restrict apps to specific key prefixes.
+4.  **Password Rotation**: Establish mechanisms to regularly rotate app passwords.
+
+For operation steps, see [User Management Docs](../functions/20-user.mdx).
+
+## Client Access
+
+### Topology Discovery
+
+Both **Sentinel** and **Cluster** modes rely on clients actively discovering and connecting to data nodes, differing from traditional LB proxy modes:
+
+#### Sentinel Mode
+
+1.  Client connects to **Sentinel Node**.
+2.  Client sends `SENTINEL get-master-addr-by-name mymaster` to get Master **IP/Port**.
+3.  Client **directly connects** to Master.
+4.  On failover, Sentinel notifies client (or client polls) to switch to new Master.
+
+#### Cluster Mode
+
+1.  Client connects to any **Cluster Node**.
+2.  Sends `CLUSTER SLOTS` / `CLUSTER NODES` to get **Slot Distribution**.
+3.  Calculates hash slot for Key and **directly connects** to target node.
+4.  If slot migrates, node returns `MOVED`/`ASK`; client must refresh topology.
+
+Both protocols return **Real Node IPs**. If a reverse proxy (HAProxy/Nginx) is used, clients still get backend real IPs, which may be unreachable from outside the cluster.
+Thus, **Each Redis Pod needs an independent external address** (NodePort/LoadBalancer), not a single proxy address.
+
+### Network Access Strategies
+
+Alauda Cache Service for Redis OSS supports multiple access methods:
+
+#### Sentinel Mode
+
+| Method | Recommended | Description |
+| :--- | :--- | :--- |
+| **ClusterIP** | ✅ **Internal Preferred** | Access Sentinel via K8s Service (port 26379). Clients auto-discover Master. Lowest latency, highest security. |
+| **LoadBalancer** | ✅ **External Preferred** | Exposes Sentinel via MetalLB/Cloud LB. Stable external entry, no port management. |
+| **NodePort** | ⚠️ External Backup | Exposes Sentinel via Node ports. Requires manual port management, risky, potential multi-NIC binding issues. |
+
+#### Cluster Mode
+
+| Method | Recommended | Description |
+| :--- | :--- | :--- |
+| **ClusterIP** | ✅ **Internal Preferred** | Access via K8s Service. Client must support Cluster protocol. |
+| **LoadBalancer** | ✅ **External Preferred** | Configure LB for each shard Master. Stable external access. Client must handle MOVED/ASK. |
+| **NodePort** | ⚠️ External Backup | Expose underlying Pod NodePorts. Client connects directly. Complex port management. |
+
+> [!WARNING]
+> **NodePort Notes**:
+> *   **Port Management**: Range limited (30000-32767), conflicts easy in multi-instance.
+> *   **Security**: Increases attack surface.
+> *   **Multi-NIC**: Redis binds default NIC; clients may fail to connect if IPs mismatch.
+> *   **No LB Proxy**: Sentinel/Cluster protocols require direct node connection; cannot be proxied by standard LBs.
+
+> [!INFO]
+> **Resource Usage**: LB/NodePort creates a **Service per Pod**.
+> *   **Sentinel** (1P1R + 3 Sentinels): Needs **8 NodePorts/LBs**.
+> *   **Cluster** (3 Shards x 1P1R): Needs **7 NodePorts/LBs**.
+
+### Code Examples
+
+We provide best practice examples for **go-redis**, **Jedis**, **Lettuce**, and **Redisson**:
+
+*   **Sentinel Access**: [How to Access Sentinel Instance](./access/10-sentinel.mdx)
+*   **Cluster Access**: [How to Access Cluster Instance](./access/20-cluster.mdx)
+
+> [!INFO]
+> **Master Group Name**: In Sentinel mode, the master name is fixed to `mymaster`.
+
+### Client Reliability Best Practices
+
+1.  **Timeouts**
+    *   **Connect Timeout**: distinct from Read Timeout. Recommend 1-3s.
+    *   **Read/Write Timeout**: Based on SLA, usually hundreds of ms.
+
+2.  **Retry Strategy**
+    *   **Exponential Backoff**: Do not retry immediately on failure; use backoff (100ms, 200ms...) to avoid retry storms.
+
+3.  **Connection Pooling**
+    *   **Reuse**: Always use pooling (JedisPool, go-redis Pool) to save handshake costs.
+    *   **Max Connections**: Set `MaxTotal` reasonably to avoid hitting Redis `maxclients`.
+
+4.  **Topology Refresh (Cluster)**
+    *   **Auto-refresh**: Ensure client enables `MOVED/ASK` handling.
+    *   **Periodic refresh**: In unstable/scaling environments, configure periodic refresh (e.g., 60s) to proactively detect changes.
+
+## Observability & Operations
+
+### Backup & Security
+
+The platform Backup Center provides convenient data management. You can backup instances, manage them centrally, and support S3 offloading. Support for restoring history to specific instances.
+
+See **[Backup & Restore](../functions/70-backup-restore.mdx)**.
+
+### Upgrade & Scaling
+
+#### Upgrade
+See **[Upgrade](../upgrade.mdx)**.
+
+#### Scaling Notes
+When changing specs (CPU/Mem) or expanding:
+1.  **Assess Resources**: Ensure cluster has capacity.
+2.  **Progressive**: Rolling updates to minimize interruption.
+3.  **Off-peak**: Execute during low traffic.
+
+> [!CAUTION]
+> When reducing replicas or specs, ensure current data/load fits new specs to avoid data loss/crash.
+
+### Monitoring
+
+Alauda Cache Service for Redis OSS has built-in metrics integrated with Prometheus.
+
+#### Built-in Metrics
+
+Variables `{{.namespace}}` and `{{.name}}` should be replaced with actual values.
+
+##### Key Hit Rate
+
+*   **Desc**: Cache hit rate.
+*   **Unit**: %
+*   **Expr**:
+    ```text
+    1/(1+(avg(irate(redis_keyspace_misses_total{namespace=~"{{.namespace}}", pod=~"(drc|rfr)-({{.name}})-.*"}[5m])) by(namespace,service) / (avg(irate(redis_keyspace_hits_total{namespace=~"{{.namespace}}", pod=~"(drc|rfr)-({{.name}})-.*"}[5m])) by(namespace,service)+1)))
+    ```
+
+##### Average Response Time
+
+*   **Desc**: Avg command latency. High = slow queries/bottleneck.
+*   **Unit**: s
+*   **Expr**:
+    ```text
+    avg((redis_commands_duration_seconds_total{namespace=~"{{.namespace}}", pod=~"(drc|rfr)-({{.name}})-.*"} / redis_commands_total{namespace=~"{{.namespace}}", pod=~"(drc|rfr)-({{.name}})-.*"})) by (namespace,service)
+    ```
+
+##### Role Switching
+
+*   **Desc**: Master-Replica switches in 5m. Non-zero = failover occurred.
+*   **Unit**: Count
+*   **Expr**:
+    ```text
+    sum by(namespace,service) (changes((sum by(namespace,service,pod)(redis_instance_info{namespace=~"{{.namespace}}",pod=~"(drc|rfr)-({{.name}})-.*",role="master"}) OR (sum by(namespace,service,pod)(redis_instance_info{namespace=~"{{.namespace}}",pod=~"(drc|rfr)-({{.name}})-.*",}) * 0))[5m:10s]))
+    ```
+
+##### Instance Status
+
+*   **Desc**: Health status. 0 = Abnormal.
+*   **Expr**:
+    ```text
+    ((count by(namespace,service)(redis_instance_info{namespace=~"{{.namespace}}",service=~"{{.name}}",redisarch="cluster"}) % count by(namespace,service)(redis_instance_info{namespace=~"{{.namespace}}",service=~"{{.name}}",redisarch="cluster",role="master"})) == bool 0 and count by(namespace,service)(redis_instance_info{namespace=~"{{.namespace}}",service=~"{{.name}}",redisarch="cluster",role="master"}) >= bool 3) or (count by(namespace,service)(redis_instance_info{namespace=~"{{.namespace}}",service=~"rfr-({{.name}})",redisarch="sentinel",role="master"})) > bool 0
+    ```
+
+##### Node Input Bandwidth
+
+*   **Desc**: Peak ingress traffic.
+*   **Unit**: Bps
+*   **Expr**:
+    ```text
+    max by(namespace,service)(irate(redis_net_input_bytes_total{namespace=~"{{.namespace}}", pod=~"(drc|rfr)-({{.name}})-.*"}[5m]))
+    ```
+
+##### Node Output Bandwidth
+
+*   **Desc**: Peak egress traffic.
+*   **Unit**: Bps
+*   **Expr**:
+    ```text
+    max by(namespace,service)(irate(redis_net_output_bytes_total{namespace=~"{{.namespace}}", pod=~"(drc|rfr)-({{.name}})-.*"}[5m]))
+    ```
+
+##### Node Connections
+
+*   **Desc**: Peak client connections. Watch if near `maxclients`.
+*   **Unit**: Count
+*   **Expr**:
+    ```text
+    max by(namespace,service)(redis_connected_clients{namespace=~"{{.namespace}}",pod=~"(drc|rfr)-({{.name}})-.*"})
+    ```
+
+##### CPU Usage
+
+*   **Desc**: Node CPU usage. Sustained high = perf impact.
+*   **Unit**: %
+*   **Expr**:
+    ```text
+    avg by(namespace,pod_name)(irate(container_cpu_usage_seconds_total{namespace=~"{{.namespace}}",pod_name=~"(drc|rfr)-({{.name}})-.*",container_name="redis"}[5m]))/avg by(namespace,pod_name)(container_spec_cpu_quota{namespace=~"{{.namespace}}",pod_name=~"(drc|rfr)-({{.name}})-.*",container_name="redis"})*100000
+    ```
+
+##### Memory Usage
+
+*   **Desc**: Node memory usage. >80% suggest scaling.
+*   **Unit**: %
+*   **Expr**:
+    ```text
+    avg by(namespace,pod_name)(container_memory_usage_bytes{namespace=~"{{.namespace}}", pod_name=~"(drc|rfr)-({{.name}})-.*",container_name="redis"} - container_memory_cache{namespace=~"{{.namespace}}", pod_name=~"(drc|rfr)-({{.name}})-.*",container_name="redis"}) / avg by(namespace,pod_name)(container_spec_memory_limit_bytes{namespace=~"{{.namespace}}", pod_name=~"(drc|rfr)-({{.name}})-.*",container_name="redis"})
+    ```
+
+##### Storage Usage
+
+*   **Desc**: PVC usage. Full = persistence failure.
+*   **Unit**: %
+*   **Expr**:
+    ```text
+    avg(kubelet_volume_stats_used_bytes{namespace=~"{{.namespace}}",persistentvolumeclaim=~"redis-data-(drc|rfr)-({{.name}})-.*"}) by(namespace,persistentvolumeclaim) / avg(kubelet_volume_stats_capacity_bytes{namespace=~"{{.namespace}}",persistentvolumeclaim=~"redis-data-(drc|rfr)-({{.name}})-.*"}) by(namespace,persistentvolumeclaim)
+    ```
+
+#### Key Metrics & Alert Recommendations
+
+Recommended production alerts:
+
+| Metric | Threshold | Note |
+| :--- | :--- | :--- |
+| **Memory Usage** | > 80% | Risk of eviction/OOM. |
+| **CPU Usage** | > 80% (Sustained) | Latency spikes. |
+| **Hit Rate** | < 80% | Strategy issue or capacity missing. |
+| **Failovers** | > 0 | Check network/node health. |
+| **Connections** | Near maxclients | New connections rejected. |
+| **Storage Usage** | > 80% | Ensure space for AOF/RDB. |
+| **Response Time** | > 10ms | Slow queries/bottlenecks. |
+
+### Troubleshooting
+
+For specific issues, search the [Customer Portal](https://cloud.alauda.cn/kb).
+
+## References
+
+* [High availability with Redis Sentinel](https://redis.io/docs/latest/operate/oss_and_stack/management/sentinel/)
+* [Redis cluster specification](https://redis.io/docs/latest/operate/oss_and_stack/reference/cluster-spec/)
+* [Redis persistence](https://redis.io/docs/latest/operate/oss_and_stack/management/persistence/)
+* [Scale with Redis Cluster](https://redis.io/docs/latest/operate/oss_and_stack/management/scaling/)
+* [Optimizing Redis](https://redis.io/docs/latest/operate/oss_and_stack/management/optimization/)