Skip to content

Welcome to the "Talk to Me" repository, a project designed to seamlessly integrate audio recording, speech recognition, text generation, and text-to-speech conversion. This software provides a dynamic and interactive audio experience.

License

Notifications You must be signed in to change notification settings

Ankit1017/TalkToMe

Repository files navigation

Talk to Me

Welcome to the "Talk to Me" repository, a project designed to seamlessly integrate audio recording, speech recognition, text generation, and text-to-speech conversion. This software provides a dynamic and interactive audio experience.

Key Features

  • Audio Recording: Capture high-quality audio using PyAudio.
  • Speech Recognition: Convert recorded audio to text with Google's Speech Recognition API.
  • Text Generation: Utilize Google Generative AI to process and generate text content.
  • Text-to-Speech Conversion: Convert generated text to speech and save it as an MP3 file using gTTS.
  • Audio Playback: Play the generated audio using Pygame.

Technologies Used

  • PyAudio: For capturing and handling audio data.
  • Wave: To save recorded audio in WAV format.
  • SpeechRecognition: For converting audio to text using Google's Speech Recognition API.
  • gTTS: For converting text to speech and saving as an MP3 file.
  • Pygame: To play audio files.
  • Google Generative AI: Leverage advanced generative AI models for content generation.

How to Use

  1. Clone the repository:
    git clone https://github.com/Ankit1017/TalkToMe.git
    

The main script captures audio, converts it to text, processes the text with generative AI, and converts it back to audio. If the phrase "OK Google" is detected in the audio, the process halts; otherwise, it continues with further processing and playback. image image image image

Contributing We welcome contributions! Please feel free to submit issues and pull requests to improve this project.

License This project is licensed under the MIT License.

About

Welcome to the "Talk to Me" repository, a project designed to seamlessly integrate audio recording, speech recognition, text generation, and text-to-speech conversion. This software provides a dynamic and interactive audio experience.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published