Currently when we create transcription using Open API or whisper.cpp we only have the dialogs however there is no way to find out who said what.
It would be better if we could determine transcription based on speakers.
In the next step allow users to rename the speakers as per their requirements.
We will need to integrate Deepgram as a provide for this. https://github.com/deepgram/deepgram-python-sdk
API can be found in .env.