Speakbot aims to derive a functional link between human & robot by leveraging spoken request from the operator to teleoperate a mobile manipulator, namely the ROBOTIS TurtleBot3 with OpenManipulator. Developed within ROS Noetic on Ubuntu 20.04, the pipeline utilises OpenAI's Whisper and GPT-3.5-Turbo LLM for audio transcription & context summarisation (abstract, compound statement/request → simple, high-level request) respectively. Utilization of both facets results in impressive interpreted request accuracy and effective summarisation. This repository aims to convey solution feasability within the Gazebo simulation environment: physical operation is not within the scope of this repository.
Waffle-Pi navigation is handled under the ROS navigation stack (move_base), whilst trajectory planning for the OpenManipulator is controlled through the MoveIt! package.
Note
Acknowledging ROS Noetic's recent EOL, migration from ROS1 (Noetic) to ROS2 (Jazzy) is certain. This repository serves as an archive of the ROS Noetic implementation. Considering this, updates will be realized on the main branch.
The academic paper for SpeakBot can be found here.
Tip
The following installation assumes existing installation of ROS 1 (Distro: Noetic) and complimentary packages (Rviz, Gazebo) on Ubuntu 20.04. If your current distro differs, it is recommended to install the ROS Noetic via the ROS Official Installation Documentation - Noetic Distro , at least until the ROS 2 version is released - opt for the Desktop-Full Install option within the 'Ubuntu install of ROS Noetic' page.
Important
Ensure that your OpenAI API key is set as an environment variable in order to use the embedded APIs. The following link can provide some insight into completing this.
Installation & Setup
• Create and initialize workspace
mkdir -p ~/speakbot_ws/src
cd ~/speakbot_ws/
catkin_make
• Source workspace
source devel/setup.bash
• Set Python version
Speakbot uses Python 3.8.10. Ensure that the default Python langauge choice matches this. If not, install Python 3.8.10 and reset the prioritisation order for Python 3.8.10.
update-alternatives --config python3
Follow the on-screen instructions to prioritise Python 3.8.10, if not already prioritised.
This repository provides the setup process for installing the TurtleBot3 simulation files. Whilst following the install instructions, ensure that the repository is cloned to your workspace e.g. ~/catkin_ws
Gazebo Classic fails to handle physical grasping processes due to limitations with the ODE physics engine. Jennifer Buehlers Gazebo Classic Grasp Plugin in the Acknowledgements section enables object retrieval in the simulation environment. Follow the linked repository to add the necessary plugin in the following .urdf file. The following assumes that the TurtleBot3 cloned repository is located in ~/catkin_ws:
cd ~/catkin_ws/src/turtlebot3_manipulation/turtlebot3_manipulation_description/urdf/open_manipulator_x.gazebo.xacro
Post the code from Jennifer Buehlers repo within the open_manipulator_x.gazebo.xacro file as shown:
Remember to build & source the catkin_ws (or the workspace you just cloned the TurtleBot3 repository to)
• Clone the repository into your current working directory
cd ~/speakbot_ws/src
git clone https://github.com/davidbcjeffreys/SpeakBot.git
• Build & source your workspace
catkin_make
source devel/setup.bash
Operation
- Launch SpeakBot simulation
roslaunch speakbot_launch Speakbot.launch
Upon starting the simulation, a Gazebo environment with multiple objects and 3 coloured blocks should appear.
- In a seperate terminal, launch SpeakBot control node
cd ~/speakbot_ws/src/speakbot/src/
python3 Speakbot.py
- Activate SpeakBot by stating the hotword: "Hello Speaker". The end-effector (gripper) should open and close to indicate that the SpeakBot is awaiting a request.
- Request for SpeakBot to retrieve one of the 3 coloured blocks in the environment
User: "Hello Speaker"
SpeakBot: "LISTENING..."
User: "I'd really like the red block, could you grab it for me please?
Configuration/Development
Configurable parameters within SpeakBot can be listed as the following:
• move_base (DWAPlannerROS) Dynamic Reconfigure
Feature allowing the responsiveness of the TurtleBot3 to be altered to favour a slower, careful path or a faster, instinctive path. SpeakBot dynamically recalibrates DWA parameters to suit either path planning choice, based on user request sentiment in GetInstruction.py. Alternatively, improvements to the DWA configurations can be altered through the rqt_reconfigure package. See link for more info.
• GPT-3.5-Turbo Summarisation Layer Prompt
Enables compound user input to be simplified into a high-level request. The model is provided a prompt, highlighted in GetInstruction.py. The system persona can be altered to improve abstraction accuracy for tasks with higher complexity or ambiguous user requests.
Great inspiration was drawn from the following repositories; for further insight into the scripts utilised in SpeakBot, please consider looking into the following:
• ROBOTIS TurtleBot3 x OpenManipulatorX
• Gazebo Classic Grasp Plugin
• OpenAI Whisper
• SayCan Algorithm

