From f062890bce68a0770fdd74cf851789679ff407f2 Mon Sep 17 00:00:00 2001 From: Daniel Massicotte Date: Sun, 8 Feb 2026 06:13:52 -0500 Subject: [PATCH 1/3] Add TAGS_GUIDE.md for HeartMuLa tag selection This document provides guidelines on tag selection for the HeartMuLa model based on its training categories and strategies for effective prompting. --- TAGS_GUIDE.md | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) create mode 100644 TAGS_GUIDE.md diff --git a/TAGS_GUIDE.md b/TAGS_GUIDE.md new file mode 100644 index 0000000..e26f986 --- /dev/null +++ b/TAGS_GUIDE.md @@ -0,0 +1,44 @@ +# HeartMuLa - Tags Guide (Prompt Engineering) + +This guide is based on the analysis of the HeartMuLa research paper (Sections 3.2 & 6.2). The model uses a natural language tokenizer (Llama 3) rather than a fixed dictionary. To achieve stable generation, select tags from the 8 primary categories used during training. + +### The 8 Pillars of Training +Each category has an Importance percentage representing its "Selection Probability" during training. + +* **Training Frequency:** Tags were "sampled" during training. Genre was included 95% of the time, while Instrument was only included 25%. +* **Model Expectations:** The model expects a Genre tag to function correctly. Without it, the generation lacks a clear structural anchor. +* **Influence vs. Stability:** Higher percentages equal higher stability. A 95% tag (Genre) is a "Strong Anchor," while a 10% tag (Topic) is a "Weak Hint" that may be ignored if it conflicts with stronger tags. +* **The Strategy:** For maximum control, lean heavily on the top 4 categories (Genre, Timbre, Gender, Mood). Use lower-percentage tags only as "seasoning" once the main structure is set. + +### Official Categories + +1. **GENRE** (95% - MANDATORY) + Examples: Pop, Rock, Electronic, Hiphop, Jazz, Classical, Techno, Trance, Ambient. +2. **TIMBRE** (50% - Sound Texture) + Examples: Soft, Warm, Husky, Bright, Dark, Distorted. +3. **GENDER** (37% - Vocal Character) + Examples: Male, Female. +4. **MOOD** (32% - Emotional Vibe) + Examples: Happy, Sad, Energetic, Joyful, Melancholic, Relaxing, Dark. +5. **INSTRUMENT** (25% - Dominant Sounds) + Examples: Piano, Synthesizer, Acoustic Guitar, Electric Guitar, Bass, Drums, Strings, Violin. +6. **SCENE** (20% - Listening Context) + Examples: Dance, Workout, Dating, Study, Cinematic, Party. +7. **REGION** (12% - Cultural Influence) + Examples: K-pop, Latin, Western. +8. **TOPIC** (10% - Lyrical Theme) + Examples: Love, Summer, Heartbreak. + +### Prompting Strategy: "Less is More" +To maintain a strong anchor and avoid "Probability Interference," avoid conflicting tags. + +* **Semantic Conflict:** Prompting "Rock, Jazz" splits the model's attention, often resulting in "muddy" or generic arrangements. +* **Anchor Stability:** One strong anchor provides a clear map. Multiple genres create conflicting maps, causing the AI to lose focus. +* **Recommendation:** Select only one tag per category. Be precise rather than broad. + +### Recommended Format +Use a comma-separated list. + +**Examples:** +* Electronic, Techno, Synthesizer, Dark, High Energy, Club +* Pop, Piano, Female, Sad, Soft, Love, Acoustic From fdbe069e94cdc0ff375412b9d2ba6265c81147b6 Mon Sep 17 00:00:00 2001 From: Daniel Massicotte Date: Sun, 8 Feb 2026 17:10:16 -0500 Subject: [PATCH 2/3] Advanced Parameters Guide Inference Parameters values and recommendations for genre adjustments. --- Advanced_Parameters_Guide.md | 83 ++++++++++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) create mode 100644 Advanced_Parameters_Guide.md diff --git a/Advanced_Parameters_Guide.md b/Advanced_Parameters_Guide.md new file mode 100644 index 0000000..011943f --- /dev/null +++ b/Advanced_Parameters_Guide.md @@ -0,0 +1,83 @@ +# HeartMuLa - Advanced Parameters Guide + +**Disclaimer:** +> The **Default** values listed below are explicitly cited from the official HeartMuLa research paper. The paper does not specify absolute minimum or maximum limits. +> The *adjustments* and "recipes" are derived from standard Large Language Model (LLM) behavior and community experimentation. While HeartMuLa is based on Llama-3.2, results may vary depending on the specific model checkpoint. + +--- + +## The "Mellow Bias": Why your Techno sounds like Pop +Many users report that the model ignores aggressive tags (e.g., *Techno, Metal, High Energy*) and defaults to a "mellow" or "pop" sound. + +**The Likely Cause:** +The HeartMuLa training dataset was filtered using **Audiobox-Aesthetic** scores to ensure high fidelity. +* **Fact:** Aesthetic filters are trained to prefer "clean" audio. +* **The Side Effect:** In generative audio, "clean" often correlates with Pop, Acoustic, or Soft textures, while "distorted" or "noisy" genres (like Dubstep or Rock) can be penalized. +* **The Result:** When the model is unsure, it drifts toward the "safe" aesthetic (Mellow/Pop). + +To break this bias, you must adjust the **Inference Parameters** to force the model away from its "safe" center. + +--- + +## 1. Classifier-Free Guidance (`--cfg_scale`) +**The "Strictness" Slider.** +This parameter controls how strongly the model forces the audio to match your text prompt (Tags/Lyrics) versus its internal training distribution. + +* **Paper Default:** `1.5` +* **How it works (General AI Logic):** + * **Low (1.0 - 1.5):** The model is "loosely" guided by your tags. It prioritizes the internal "aesthetic" bias (smooth/safe audio). + * **High (2.0 - 4.0):** The model is "forced" to match your tags, even if the result is less "aesthetically safe." + +**Community Observation:** +Users have reported that the default `1.5` is often too weak for specific genres. Increasing this value has helped users generate genres like R&B that were previously generating as Pop. + +**Recommendation:** +If your genre is being ignored, **increase** this value. Start with `2.5` and go up to `4.0` if needed. + +--- + +## 2. Temperature (`--temperature`) +**The "Randomness" Slider.** +This controls the probability distribution for the next audio token. + +* **Paper Default:** `1.0` +* **How it works (General AI Logic):** + * **Lower (< 1.0):** The model becomes "conservative." It picks only the most likely sounds. This usually results in more repetitive, structured, and coherent rhythms. + * **Higher (> 1.0):** The model becomes "creative" and takes risks. This adds variety but increases the chance of chaos or the melody falling apart. + +**Recommendation:** +For genres requiring strict rhythm (Techno, House), try **lowering** this slightly to `0.8` or `0.9` to lock in the groove. + +--- + +## 3. Top-K (`--topk`) +**The "Vocabulary" Limit.** +This limits the pool of possible "next sounds" to the top *K* most likely options. + +* **Paper Default:** `50` +* **How it works:** + * A standard setting for Llama-based models. Lowering this (e.g., to 30) can reduce "hallucinations" or random artifacts, but may make the audio sound dull. + +--- + +## 🧪 Experimental "Recipes" + +These settings are suggestions based on how Autoregressive Transformers generally respond to these parameters. They are not official presets. + +### A. The "Aggressive / Specific" Fix +*Use this if the model is ignoring your Genre tags (e.g., Metal, Techno, Rap).* +* **Logic:** High CFG forces the genre; Lower Temp keeps the rhythm tight. +* **Command:** + `--cfg_scale 3.0 --temperature 0.8` + +### B. The "Creative / Jazz" Flow +*Use this for genres that benefit from improvisation or loose timing.* +* **Logic:** Moderate CFG allows some freedom; Higher Temp encourages unique melodies. +* **Command:** + `--cfg_scale 2.0 --temperature 1.1` + +### C. The "Safe / High Fidelity" (Paper Default) +*Use this for Pop, Ballads, or when audio quality is the priority.* +* **Logic:** Low CFG prioritizes the "Aesthetic" filter; Default Temp ensures standard variety. +* **Command:** + `--cfg_scale 1.5 --temperature 1.0` From 9aa2ec08e376b69bce85015545d3840172dabaa3 Mon Sep 17 00:00:00 2001 From: Daniel Massicotte Date: Sun, 8 Feb 2026 17:13:21 -0500 Subject: [PATCH 3/3] Rename Advanced_Parameters_Guide.md to ADV_PARAMETERS_GUIDE.md --- Advanced_Parameters_Guide.md => ADV_PARAMETERS_GUIDE.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Advanced_Parameters_Guide.md => ADV_PARAMETERS_GUIDE.md (100%) diff --git a/Advanced_Parameters_Guide.md b/ADV_PARAMETERS_GUIDE.md similarity index 100% rename from Advanced_Parameters_Guide.md rename to ADV_PARAMETERS_GUIDE.md