caal-ministral is live! 🚀 #45
Replies: 10 comments
-
|
Exciting news! Looking forward to testing this out. Happy to provide feedback once I’ve had some hands-on time with it. Also, if useful, I’d be open to fine-tuning a Ministral-3 14B variant for CAAL as well. I have a 5090 with plenty of VRAM, so this won't be an issue. |
Beta Was this translation helpful? Give feedback.
-
FeedbackThe fine-tuned model is excellent. Out of all interactions, it only missed one tool call, which it corrected in the very next response. Overall, tool calling accuracy and conversational flow were impressive. I’d be keen to see this applied to larger models if you choose to release the dataset. Settings
ConversationMe: Hey Cal, can you get Elon Musk's X profile? CAAL: Elon Musk's X profile shows he has about 234 million followers and has been on the platform since June 2009. You can't send him a DM, and he's not following you. Me: What's his pinned tweet? CAAL: Elon Musk's pinned tweet says: "Building an interstellar civilization" Me: How many likes does it have? CAAL: Elon Musk's pinned tweet has 26,181 likes. It's about building an interstellar civilization. Me: How many times has it been quoted? CAAL: Elon Musk's pinned tweet has been quoted 418 times. Me: And how many replies does it have currently? CAAL: Elon Musk's pinned tweet has 4,364 replies. That's a lot of engagement for a single tweet! Me: Am I currently following him? CAAL: No, you are not currently following Elon Musk. Me: Can you like his pinned tweet for me? CAAL: I can't like tweets on your behalf, but I can help you find the tweet ID so you can like it yourself. Would you like me to retrieve the exact tweet ID for Elon Musk's pinned tweet? Me: You already did earlier. Just like it now. CAAL: I've liked Elon Musk's pinned tweet for you. It's now at 26,188 likes. Me: What's the weather in Adelaide currently? CAAL: Adelaide is currently 21 degrees with a 5 percent chance of rain. It feels like 18 degrees with 48 percent humidity. Me: Will I need an umbrella or UV protection on Thursday? CAAL: Adelaide's forecast for Thursday is sunny with only a 5 percent chance of rain. You won't need an umbrella, but UV protection would be a good idea since the UV index is likely to be high. Me: Can you search the latest news on tech stocks? How's QQQ doing? CAAL: The QQQ ETF is doing well today, up about half a percent. It's currently trading within its 52-week range of four hundred twenty-three to six hundred thirty-seven. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the test and feedback! This reflects the testing I did as well. I was also impressed with the improved tool handling and conversation flow that the fine-tuning provided. As the tool registry grows, the performance should improve as well. The idea is to periodically retrain as new tools come into the tool registry. The training is a mix of real and synthetic tools but that ratio will shift as more real tools are available. Synthetic tools are good for pattern training but real tools are even better as then the model is already familiar with that tool and its available actions, parameters and response format. I’m going to do a retrain in the next couple days with your added x-twitter tool, as well as many more examples of chain tool calling to take advantage of the new ability I pushed to CAAL last night. The ability to call one tool and use the response in the next tool, all in one prompt, up to 5 in a row. 5 seemed like a reasonable max to prevent infinite looping, I’d be surprised if these small local models could string together 5 tools in a row. Ex. User: Email Abdul and let him know that his x-twitter tool is impressive. I’m in the market for a 3090 with 24gb so that I can train the 14b model like you suggested. I might have to take you up on your offer to provide the training horsepower. I will let you know. Cheers |
Beta Was this translation helpful? Give feedback.
-
|
I haven’t tried the chain tool calling yet, but that sounds incredibly useful. 5 tools in a row seems like a solid starting point, and like your tool cache size, it could always be made configurable (could be lowered for example for users with 8gb cards). Good luck with the GPU shopping. 3090 costs around $3,500 here in Aus, so I hope prices are more reasonable in Canada! xD And yes, please don’t hesitate to reach out if you’d like me to help train any larger models, not just the ministral 14b. I’ve successfully fine-tuned 24b models before, though they can push VRAM usage up to ~31gb, which is a bit more risky. |
Beta Was this translation helpful? Give feedback.
-
|
I just pushed a new version of the model that was trained on an enhanced dataset that includes tool chaining. Including 205 different examples of using your x-twitter tool. Can you do some testing with this version? ollama pull coreworxlab/caal-ministralHere's a quick test I did: me: "Like Elon Musk's latest tweet." It chained 4 tool calls in a row, using the Edit: make sure you have temperature set to 0.15 in settings. I just realized it was defaulting to 0.7. 0.15 is the recommended on ministral-3 for instruct. I've pushed a commit to change the default temp. |
Beta Was this translation helpful? Give feedback.
-
|
Pulled the updated model and ran a series of tests focused on chaining. NoteTool chaining would not work for me using the standard Docker workflow, even after fully pruning images, containers, networks, and volumes. Adding the ResultsMe: i need you to read out my most recent tweet Me: now can you like @NVIDIA's latest tweet Me: can you check the weather in sydney tomorrow, then DM it to @AbShahzeb Me: Can you compare Sydney's weather tomorrow with Montreal's? Me: can you confirm with the weather tools? Me: Find Melbourne's forecast on Sunday, then compare it to Toronto's forecast on Monday. How do they differ? Me: check the current weather in Hobart, and also check the current weather in Quebec City. Then tweet a brief summary of the difference only. Me: Get Sydney's weather forecast on Monday, then DM it to @AbShahzeb on X. Me: Get elon musk's latest tweet, like it, then DM it word for word to @AbShahzeb on X. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the testing. Here's what I see:
We have 3 levers to pull: prompt, tool descriptions and training. The prompt should be generic for all users. I added a one liner just now about tool chaining, but we don't want to oversteer with the prompt. Tool descriptions are a lever to pull. Maybe the weather tools need to instruct use of those tools over web_search. This is something we can test. Training: I have 205 examples of in-suite x_twitter chaining examples. But limited cross-tool examples. I'm going to expand that now, along with further weather tool training. cmac |
Beta Was this translation helpful? Give feedback.
-
|
Just pushed a new model trained on an enhanced dataset targeting every issue you found: ollama pull coreworxlab/caal-ministralWhat changed in training:
A couple tests I ran: "Get Elon Musk's latest tweet then email it to Corey" 4 tool call chain across 3 different tools, used real data from each step. That's the pattern that was failing before. "Get the current Melbourne weather and tweet it" "Tweet posted: It's 26 degrees in Melbourne with a 0 percent chance of rain. Perfect weather!" I think it does better right now if you instruct it in the same sequence as the tool calls. Get Elon Musk's latest tweet and email it to Corey. I'll have to enhance the training data again to included phrases that mix the order around. Give it a go! cmac |
Beta Was this translation helpful? Give feedback.
-
|
HASS Training Update Used your Home Assistant device export to train the model on hass_control and hass_get_state tools. What we extracted from your data:
What we trained on (121 examples): hass_control actions:
hass_get_state — querying device states, sensor readings, temperatures, humidity Mixed real + generic device names so the model generalizes beyond your specific setup. All included in the latest model: coreworxlab/caal-ministral:20260206 ollama pull coreworxlab/caal-ministral:latestor ollama pull coreworxlab/caal-ministral:20260206Let me know how it goes! cmac |
Beta Was this translation helpful? Give feedback.
-
|
All, I just realized that when I pushed You'll have to re-pull if you pulled in the last day as I just fixed it.
cmac |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Fine-tuned model for CAAL tool calling is available.
Based on Ministral-3 8B, trained on the CAAL Tool API. Set it as your model in CAAL settings.
Looking for feedback, please.
More details coming soon.
Beta Was this translation helpful? Give feedback.
All reactions