(fix): Regenerate the system prompt to force the system not to reveal internal details#195
(fix): Regenerate the system prompt to force the system not to reveal internal details#195alvaro-mazcu wants to merge 8 commits intodevelopmentfrom
Conversation
… internal details
| if self._are_the_tools_names_mentioned(llm_content): | ||
| self.logger.warning( | ||
| "Tool name leakage detected in LLM response; sending fallback message." | ||
| ) | ||
| await whatsapp_client.send_message( | ||
| user.wa_id, strings.get_string(StringCategory.ERROR, "tool_leakage") | ||
| ) | ||
| record_messages_generated("tool_names_mentioned_error") | ||
| return JSONResponse(content={"status": "ok"}, status_code=200) | ||
|
|
||
| self.logger.debug( | ||
| f"Sending message to {user.wa_id}: {llm_responses[-1].content}" | ||
| ) |
There was a problem hiding this comment.
Nice, but something is not working right, I get the warning in the terminal
WARNING: 2026-01-31 19:11:31 - app.services.messaging_service - Tool name leakage detected in LLM response; sending fallback message.
But still see the original LLM message with the tools in the chat instead of the error message.
app/services/messaging_service.py
Outdated
|
|
||
| if llm_responses: | ||
| # Update the database with the responses | ||
| await db.create_new_messages(llm_responses) |
There was a problem hiding this comment.
Does this not mean that the response with tools is saved to the database and fetched for the history? Would it not make sense to have the tool check before saving to the database so that the history will reflect the expected response?
There was a problem hiding this comment.
Yes, that's correct. But I believe that this is out of the scope of this PR. We will talk about this on Thursday and we design the exact message history, as I believe this would suffer a refactor
Twiga keeps revealing internal tools names. This new system prompt forces the chatbot to avoid revealing these details. I want you guys to test Twiga with this new prompt. I have been testing it thoroughly and it has worked.
cc @jurmy24 @fredygerman