HeartMuLa · OneMonkeyArmy · Feb 8, 2026 · Feb 8, 2026 · Feb 8, 2026
diff --git a/ADV_PARAMETERS_GUIDE.md b/ADV_PARAMETERS_GUIDE.md
@@ -0,0 +1,83 @@
+# HeartMuLa - Advanced Parameters Guide
+
+**Disclaimer:**
+> The **Default** values listed below are explicitly cited from the official HeartMuLa research paper. The paper does not specify absolute minimum or maximum limits.
+> The *adjustments* and "recipes" are derived from standard Large Language Model (LLM) behavior and community experimentation. While HeartMuLa is based on Llama-3.2, results may vary depending on the specific model checkpoint.
+
+---
+
+## The "Mellow Bias": Why your Techno sounds like Pop
+Many users report that the model ignores aggressive tags (e.g., *Techno, Metal, High Energy*) and defaults to a "mellow" or "pop" sound.
+
+**The Likely Cause:**
+The HeartMuLa training dataset was filtered using **Audiobox-Aesthetic** scores to ensure high fidelity.
+* **Fact:** Aesthetic filters are trained to prefer "clean" audio.
+* **The Side Effect:** In generative audio, "clean" often correlates with Pop, Acoustic, or Soft textures, while "distorted" or "noisy" genres (like Dubstep or Rock) can be penalized.
+* **The Result:** When the model is unsure, it drifts toward the "safe" aesthetic (Mellow/Pop).
+
+To break this bias, you must adjust the **Inference Parameters** to force the model away from its "safe" center.
+
+---
+
+## 1. Classifier-Free Guidance (`--cfg_scale`)
+**The "Strictness" Slider.**
+This parameter controls how strongly the model forces the audio to match your text prompt (Tags/Lyrics) versus its internal training distribution.
+
+* **Paper Default:** `1.5`
+* **How it works (General AI Logic):**
+    * **Low (1.0 - 1.5):** The model is "loosely" guided by your tags. It prioritizes the internal "aesthetic" bias (smooth/safe audio).
+    * **High (2.0 - 4.0):** The model is "forced" to match your tags, even if the result is less "aesthetically safe."
+
+**Community Observation:**
+Users have reported that the default `1.5` is often too weak for specific genres. Increasing this value has helped users generate genres like R&B that were previously generating as Pop.
+
+**Recommendation:**
+If your genre is being ignored, **increase** this value. Start with `2.5` and go up to `4.0` if needed.
+
+---
+
+## 2. Temperature (`--temperature`)
+**The "Randomness" Slider.**
+This controls the probability distribution for the next audio token.
+
+* **Paper Default:** `1.0`
+* **How it works (General AI Logic):**
+    * **Lower (< 1.0):** The model becomes "conservative." It picks only the most likely sounds. This usually results in more repetitive, structured, and coherent rhythms.
+    * **Higher (> 1.0):** The model becomes "creative" and takes risks. This adds variety but increases the chance of chaos or the melody falling apart.
+
+**Recommendation:**
+For genres requiring strict rhythm (Techno, House), try **lowering** this slightly to `0.8` or `0.9` to lock in the groove.
+
+---
+
+## 3. Top-K (`--topk`)
+**The "Vocabulary" Limit.**
+This limits the pool of possible "next sounds" to the top *K* most likely options.
+
+* **Paper Default:** `50`
+* **How it works:**
+    * A standard setting for Llama-based models. Lowering this (e.g., to 30) can reduce "hallucinations" or random artifacts, but may make the audio sound dull.
+
+---
+
+## 🧪 Experimental "Recipes"
+
+These settings are suggestions based on how Autoregressive Transformers generally respond to these parameters. They are not official presets.
+
+### A. The "Aggressive / Specific" Fix
+*Use this if the model is ignoring your Genre tags (e.g., Metal, Techno, Rap).*
+* **Logic:** High CFG forces the genre; Lower Temp keeps the rhythm tight.
+* **Command:**
+    `--cfg_scale 3.0 --temperature 0.8`
+
+### B. The "Creative / Jazz" Flow
+*Use this for genres that benefit from improvisation or loose timing.*
+* **Logic:** Moderate CFG allows some freedom; Higher Temp encourages unique melodies.
+* **Command:**
+    `--cfg_scale 2.0 --temperature 1.1`
+
+### C. The "Safe / High Fidelity" (Paper Default)
+*Use this for Pop, Ballads, or when audio quality is the priority.*
+* **Logic:** Low CFG prioritizes the "Aesthetic" filter; Default Temp ensures standard variety.
+* **Command:**
+    `--cfg_scale 1.5 --temperature 1.0`
diff --git a/TAGS_GUIDE.md b/TAGS_GUIDE.md
@@ -0,0 +1,44 @@
+# HeartMuLa - Tags Guide (Prompt Engineering)
+
+This guide is based on the analysis of the HeartMuLa research paper (Sections 3.2 & 6.2). The model uses a natural language tokenizer (Llama 3) rather than a fixed dictionary. To achieve stable generation, select tags from the 8 primary categories used during training.
+
+### The 8 Pillars of Training
+Each category has an Importance percentage representing its "Selection Probability" during training.
+
+* **Training Frequency:** Tags were "sampled" during training. Genre was included 95% of the time, while Instrument was only included 25%.
+* **Model Expectations:** The model expects a Genre tag to function correctly. Without it, the generation lacks a clear structural anchor.
+* **Influence vs. Stability:** Higher percentages equal higher stability. A 95% tag (Genre) is a "Strong Anchor," while a 10% tag (Topic) is a "Weak Hint" that may be ignored if it conflicts with stronger tags.
+* **The Strategy:** For maximum control, lean heavily on the top 4 categories (Genre, Timbre, Gender, Mood). Use lower-percentage tags only as "seasoning" once the main structure is set.
+
+### Official Categories
+
+1. **GENRE** (95% - MANDATORY)
+   Examples: Pop, Rock, Electronic, Hiphop, Jazz, Classical, Techno, Trance, Ambient.
+2. **TIMBRE** (50% - Sound Texture)
+   Examples: Soft, Warm, Husky, Bright, Dark, Distorted.
+3. **GENDER** (37% - Vocal Character)
+   Examples: Male, Female.
+4. **MOOD** (32% - Emotional Vibe)
+   Examples: Happy, Sad, Energetic, Joyful, Melancholic, Relaxing, Dark.
+5. **INSTRUMENT** (25% - Dominant Sounds)
+   Examples: Piano, Synthesizer, Acoustic Guitar, Electric Guitar, Bass, Drums, Strings, Violin.
+6. **SCENE** (20% - Listening Context)
+   Examples: Dance, Workout, Dating, Study, Cinematic, Party.
+7. **REGION** (12% - Cultural Influence)
+   Examples: K-pop, Latin, Western.
+8. **TOPIC** (10% - Lyrical Theme)
+   Examples: Love, Summer, Heartbreak.
+
+### Prompting Strategy: "Less is More"
+To maintain a strong anchor and avoid "Probability Interference," avoid conflicting tags.
+
+* **Semantic Conflict:** Prompting "Rock, Jazz" splits the model's attention, often resulting in "muddy" or generic arrangements.
+* **Anchor Stability:** One strong anchor provides a clear map. Multiple genres create conflicting maps, causing the AI to lose focus.
+* **Recommendation:** Select only one tag per category. Be precise rather than broad.
+
+### Recommended Format
+Use a comma-separated list.
+
+**Examples:**
+* Electronic, Techno, Synthesizer, Dark, High Energy, Club
+* Pop, Piano, Female, Sad, Soft, Love, Acoustic