This project enables hands-free control of a computer screen through voice commands and hand gestures. It uses various Python libraries for speech recognition, gesture detection, browser automation, system monitoring, and more. This system aims to improve accessibility and provide a more intuitive user interface by combining voice and gesture-based interactions.
- Voice Commands: Control system tasks, search the web, and interact with applications using voice commands.
- Hand Gestures: Use hand movements detected through the camera to control the screen, mouse movements, and interact with the system.
- System Monitoring: Check CPU usage, memory status, and internet speed.
- Automated Browser Control: Perform automated web searches and navigate using voice commands.
- Joke Teller: Retrieve and read aloud jokes for user entertainment.
- Currency Conversion: Convert currencies using real-time exchange rates.
- Screenshots & Automation: Automate tasks such as taking screenshots and controlling system processes.
pyttsx3: Text-to-speech conversion for voice feedback.speech_recognition: Recognizes voice commands and converts speech to text.wikipedia: Fetches Wikipedia content for voice searches.datetime: Retrieves the current time and date.pyjokes: Provides jokes for entertainment.cv2(OpenCV): Handles video capture and image processing for hand gestures.winsound: Plays system sounds for notifications.psutil: Monitors system resources like CPU and memory.requests: Fetches data from the web for API integration.subprocess: Executes system commands and scripts.pyautogui: Automates keyboard and mouse interactions.speedtest: Measures internet speed.selenium: Automates browser tasks using web drivers.cvzone.HandTrackingModule: Detects hand gestures through video input.mouse: Controls mouse movements based on hand gestures.forex_python.converter: Converts currencies using real-time exchange rates.webbrowser: Automates opening and controlling web browsers.
- Speech Recognition: The program listens for voice commands using a microphone.
- Command Execution: Based on the recognized speech, specific actions like web searches, system control, or task automation are executed.
- Voice Feedback: The system responds with voice feedback using text-to-speech for confirmation and results.
- Hand Tracking: Using the webcam, the program detects hand gestures using OpenCV and
cvzone.HandTrackingModule. - Gesture-based Control: Gestures can be mapped to different actions such as controlling the mouse, clicking, or switching between windows.
- CPU and Memory Usage: Retrieves current system resource usage through
psutiland announces it via voice feedback. - Internet Speed Test: Measures internet speed using
speedtestand reads the results aloud.
- Web Search: Perform Google searches or Wikipedia lookups using voice commands.
- Web Automation: Open and control browser windows using Selenium and voice commands.
- Real-time Conversion: Converts currencies using
forex_python.converterand provides real-time exchange rates.
- Python 3.x
- A microphone for voice input.
- A webcam for gesture detection.
Install the necessary libraries using pip:
pip install pyttsx3 SpeechRecognition wikipedia pyjokes opencv-python winsound psutil requests pyautogui speedtest-cli selenium cvzone mouse forex-python- Selenium WebDriver: Download and configure the appropriate WebDriver for your browser (e.g., Chrome, Firefox) for web automation. Place the WebDriver in the project directory.
- Webcam: Ensure your webcam is connected and functional for gesture detection.
-
Clone the repository:
git clone https://github.com/yourusername/screen-control-voice-gestures.git
-
Run the Python script:
python screen_control.py
-
Start giving voice commands or use hand gestures for interaction.
- "What is the time?" – Fetches the current time.
- "Tell me a joke" – Reads a joke aloud.
- "Search Wikipedia for Python" – Searches Wikipedia for Python and reads the summary.
- "Open Google" – Opens Google in the web browser.
- Hand gestures – Control mouse movements and interact with the screen using hand gestures.
- Integrating voice-to-text for improved web search interactions.
- Adding more gestures for detailed control of specific applications.
- Expanding system commands for better automation and multi-tasking.
This project is licensed under the MIT License.
Thanks to the open-source community for the libraries and resources that made this project possible.