Skip to content

(fix): Regenerate the system prompt to force the system not to reveal internal details#195

Open
alvaro-mazcu wants to merge 8 commits intodevelopmentfrom
fix/avoid_showing_tool_names_in_messages
Open

(fix): Regenerate the system prompt to force the system not to reveal internal details#195
alvaro-mazcu wants to merge 8 commits intodevelopmentfrom
fix/avoid_showing_tool_names_in_messages

Conversation

@alvaro-mazcu
Copy link
Member

Twiga keeps revealing internal tools names. This new system prompt forces the chatbot to avoid revealing these details. I want you guys to test Twiga with this new prompt. I have been testing it thoroughly and it has worked.

cc @jurmy24 @fredygerman

@alvaro-mazcu alvaro-mazcu self-assigned this Jan 28, 2026
Comment on lines 243 to 255
if self._are_the_tools_names_mentioned(llm_content):
self.logger.warning(
"Tool name leakage detected in LLM response; sending fallback message."
)
await whatsapp_client.send_message(
user.wa_id, strings.get_string(StringCategory.ERROR, "tool_leakage")
)
record_messages_generated("tool_names_mentioned_error")
return JSONResponse(content={"status": "ok"}, status_code=200)

self.logger.debug(
f"Sending message to {user.wa_id}: {llm_responses[-1].content}"
)
Copy link
Contributor

@Ben-Temming Ben-Temming Jan 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, but something is not working right, I get the warning in the terminal

WARNING: 2026-01-31 19:11:31 - app.services.messaging_service - Tool name leakage detected in LLM response; sending fallback message.

But still see the original LLM message with the tools in the chat instead of the error message.


if llm_responses:
# Update the database with the responses
await db.create_new_messages(llm_responses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not mean that the response with tools is saved to the database and fetched for the history? Would it not make sense to have the tool check before saving to the database so that the history will reflect the expected response?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. But I believe that this is out of the scope of this PR. We will talk about this on Thursday and we design the exact message history, as I believe this would suffer a refactor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants