Skip to content

Driving an OpenManipulatorX on a WafflePi mobile base, using Whisper API to gauge low-level interfacing commands from high-level verbal user instruction.

Notifications You must be signed in to change notification settings

davidbcjeffreys/SpeakBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

SpeakBot

A high-level speech-to-action pipeline for mobile manipulators

image

Overview

Speakbot aims to derive a functional link between human & robot by leveraging spoken request from the operator to teleoperate a mobile manipulator, namely the ROBOTIS TurtleBot3 with OpenManipulator. Developed within ROS Noetic on Ubuntu 20.04, the pipeline utilises OpenAI's Whisper and GPT-3.5-Turbo LLM for audio transcription & context summarisation (abstract, compound statement/request → simple, high-level request) respectively. Utilization of both facets results in impressive interpreted request accuracy and effective summarisation. This repository aims to convey solution feasability within the Gazebo simulation environment: physical operation is not within the scope of this repository.

Waffle-Pi navigation is handled under the ROS navigation stack (move_base), whilst trajectory planning for the OpenManipulator is controlled through the MoveIt! package.

Note

Acknowledging ROS Noetic's recent EOL, migration from ROS1 (Noetic) to ROS2 (Jazzy) is certain. This repository serves as an archive of the ROS Noetic implementation. Considering this, updates will be realized on the main branch.

Associated Literature

The academic paper for SpeakBot can be found here.


Usage

Tip

The following installation assumes existing installation of ROS 1 (Distro: Noetic) and complimentary packages (Rviz, Gazebo) on Ubuntu 20.04. If your current distro differs, it is recommended to install the ROS Noetic via the ROS Official Installation Documentation - Noetic Distro , at least until the ROS 2 version is released - opt for the Desktop-Full Install option within the 'Ubuntu install of ROS Noetic' page.

Important

Ensure that your OpenAI API key is set as an environment variable in order to use the embedded APIs. The following link can provide some insight into completing this.

Installation & Setup

Prerequisites - Workspace setup

• Create and initialize workspace

mkdir -p ~/speakbot_ws/src
cd ~/speakbot_ws/
catkin_make

• Source workspace

source devel/setup.bash

• Set Python version

Speakbot uses Python 3.8.10. Ensure that the default Python langauge choice matches this. If not, install Python 3.8.10 and reset the prioritisation order for Python 3.8.10.

update-alternatives --config python3

Follow the on-screen instructions to prioritise Python 3.8.10, if not already prioritised.


Prerequisites - TurtleBot3 x OpenManipulator & Grasp Plugin Setup

This repository provides the setup process for installing the TurtleBot3 simulation files. Whilst following the install instructions, ensure that the repository is cloned to your workspace e.g. ~/catkin_ws

Gazebo Classic fails to handle physical grasping processes due to limitations with the ODE physics engine. Jennifer Buehlers Gazebo Classic Grasp Plugin in the Acknowledgements section enables object retrieval in the simulation environment. Follow the linked repository to add the necessary plugin in the following .urdf file. The following assumes that the TurtleBot3 cloned repository is located in ~/catkin_ws:

cd ~/catkin_ws/src/turtlebot3_manipulation/turtlebot3_manipulation_description/urdf/open_manipulator_x.gazebo.xacro

Post the code from Jennifer Buehlers repo within the open_manipulator_x.gazebo.xacro file as shown:

image

Remember to build & source the catkin_ws (or the workspace you just cloned the TurtleBot3 repository to)

SpeakBot installation

• Clone the repository into your current working directory

cd ~/speakbot_ws/src
git clone https://github.com/davidbcjeffreys/SpeakBot.git

• Build & source your workspace

catkin_make
source devel/setup.bash
Operation
  1. Launch SpeakBot simulation
roslaunch speakbot_launch Speakbot.launch

Upon starting the simulation, a Gazebo environment with multiple objects and 3 coloured blocks should appear.

  1. In a seperate terminal, launch SpeakBot control node
cd ~/speakbot_ws/src/speakbot/src/
python3 Speakbot.py
  1. Activate SpeakBot by stating the hotword: "Hello Speaker". The end-effector (gripper) should open and close to indicate that the SpeakBot is awaiting a request.
  2. Request for SpeakBot to retrieve one of the 3 coloured blocks in the environment

Example

User: "Hello Speaker"
SpeakBot: "LISTENING..."
User: "I'd really like the red block, could you grab it for me please?

Configuration/Development

Configurable parameters within SpeakBot can be listed as the following:

move_base (DWAPlannerROS) Dynamic Reconfigure

Feature allowing the responsiveness of the TurtleBot3 to be altered to favour a slower, careful path or a faster, instinctive path. SpeakBot dynamically recalibrates DWA parameters to suit either path planning choice, based on user request sentiment in GetInstruction.py. Alternatively, improvements to the DWA configurations can be altered through the rqt_reconfigure package. See link for more info.

GPT-3.5-Turbo Summarisation Layer Prompt

Enables compound user input to be simplified into a high-level request. The model is provided a prompt, highlighted in GetInstruction.py. The system persona can be altered to improve abstraction accuracy for tasks with higher complexity or ambiguous user requests.


Acknowledgments

Great inspiration was drawn from the following repositories; for further insight into the scripts utilised in SpeakBot, please consider looking into the following:

ROBOTIS TurtleBot3 x OpenManipulatorX
Gazebo Classic Grasp Plugin
OpenAI Whisper
SayCan Algorithm

About

Driving an OpenManipulatorX on a WafflePi mobile base, using Whisper API to gauge low-level interfacing commands from high-level verbal user instruction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published