SpeakBot

A high-level speech-to-action pipeline for mobile manipulators

Overview

Speakbot aims to derive a functional link between human & robot by leveraging spoken request from the operator to teleoperate a mobile manipulator, namely the ROBOTIS TurtleBot3 with OpenManipulator. Developed within ROS Noetic on Ubuntu 20.04, the pipeline utilises OpenAI's Whisper and GPT-3.5-Turbo LLM for audio transcription & context summarisation (abstract, compound statement/request → simple, high-level request) respectively. Utilization of both facets results in impressive interpreted request accuracy and effective summarisation. This repository aims to convey solution feasability within the Gazebo simulation environment: physical operation is not within the scope of this repository.

Waffle-Pi navigation is handled under the ROS navigation stack (move_base), whilst trajectory planning for the OpenManipulator is controlled through the MoveIt! package.

Note

Acknowledging ROS Noetic's recent EOL, migration from ROS1 (Noetic) to ROS2 (Jazzy) is certain. This repository serves as an archive of the ROS Noetic implementation. Considering this, updates will be realized on the main branch.

Associated Literature

The academic paper for SpeakBot can be found here.

Usage

Tip

The following installation assumes existing installation of ROS 1 (Distro: Noetic) and complimentary packages (Rviz, Gazebo) on Ubuntu 20.04. If your current distro differs, it is recommended to install the ROS Noetic via the ROS Official Installation Documentation - Noetic Distro , at least until the ROS 2 version is released - opt for the Desktop-Full Install option within the 'Ubuntu install of ROS Noetic' page.

Important

Ensure that your OpenAI API key is set as an environment variable in order to use the embedded APIs. The following link can provide some insight into completing this.

Installation & Setup

Prerequisites - Workspace setup

• Create and initialize workspace

mkdir -p ~/speakbot_ws/src
cd ~/speakbot_ws/
catkin_make

• Source workspace

source devel/setup.bash

• Set Python version

Speakbot uses Python 3.8.10. Ensure that the default Python langauge choice matches this. If not, install Python 3.8.10 and reset the prioritisation order for Python 3.8.10.

update-alternatives --config python3

Follow the on-screen instructions to prioritise Python 3.8.10, if not already prioritised.

Prerequisites - TurtleBot3 x OpenManipulator & Grasp Plugin Setup

This repository provides the setup process for installing the TurtleBot3 simulation files. Whilst following the install instructions, ensure that the repository is cloned to your workspace e.g. ~/catkin_ws

Gazebo Classic fails to handle physical grasping processes due to limitations with the ODE physics engine. Jennifer Buehlers Gazebo Classic Grasp Plugin in the Acknowledgements section enables object retrieval in the simulation environment. Follow the linked repository to add the necessary plugin in the following .urdf file. The following assumes that the TurtleBot3 cloned repository is located in ~/catkin_ws:

cd ~/catkin_ws/src/turtlebot3_manipulation/turtlebot3_manipulation_description/urdf/open_manipulator_x.gazebo.xacro

Post the code from Jennifer Buehlers repo within the open_manipulator_x.gazebo.xacro file as shown:

Remember to build & source the catkin_ws (or the workspace you just cloned the TurtleBot3 repository to)

SpeakBot installation

• Clone the repository into your current working directory

cd ~/speakbot_ws/src
git clone https://github.com/davidbcjeffreys/SpeakBot.git

• Build & source your workspace

catkin_make
source devel/setup.bash

Operation

Launch SpeakBot simulation

roslaunch speakbot_launch Speakbot.launch

Upon starting the simulation, a Gazebo environment with multiple objects and 3 coloured blocks should appear.

In a seperate terminal, launch SpeakBot control node

cd ~/speakbot_ws/src/speakbot/src/
python3 Speakbot.py

Activate SpeakBot by stating the hotword: "Hello Speaker". The end-effector (gripper) should open and close to indicate that the SpeakBot is awaiting a request.
Request for SpeakBot to retrieve one of the 3 coloured blocks in the environment

Example

User: "Hello Speaker"
SpeakBot: "LISTENING..."
User: "I'd really like the red block, could you grab it for me please?

Configuration/Development

Configurable parameters within SpeakBot can be listed as the following:

• move_base (DWAPlannerROS) Dynamic Reconfigure

Feature allowing the responsiveness of the TurtleBot3 to be altered to favour a slower, careful path or a faster, instinctive path. SpeakBot dynamically recalibrates DWA parameters to suit either path planning choice, based on user request sentiment in GetInstruction.py. Alternatively, improvements to the DWA configurations can be altered through the rqt_reconfigure package. See link for more info.

• GPT-3.5-Turbo Summarisation Layer Prompt

Enables compound user input to be simplified into a high-level request. The model is provided a prompt, highlighted in GetInstruction.py. The system persona can be altered to improve abstraction accuracy for tasks with higher complexity or ambiguous user requests.

Acknowledgments

Great inspiration was drawn from the following repositories; for further insight into the scripts utilised in SpeakBot, please consider looking into the following:

• ROBOTIS TurtleBot3 x OpenManipulatorX
• Gazebo Classic Grasp Plugin
• OpenAI Whisper
• SayCan Algorithm

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
speakbot		speakbot
speakbot_launch		speakbot_launch
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeakBot

A high-level speech-to-action pipeline for mobile manipulators

Overview

Associated Literature

Usage

Prerequisites - Workspace setup

Prerequisites - TurtleBot3 x OpenManipulator & Grasp Plugin Setup

SpeakBot installation

Example

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

davidbcjeffreys/SpeakBot

Folders and files

Latest commit

History

Repository files navigation

SpeakBot

A high-level speech-to-action pipeline for mobile manipulators

Overview

Associated Literature

Usage

Prerequisites - Workspace setup

Prerequisites - TurtleBot3 x OpenManipulator & Grasp Plugin Setup

SpeakBot installation

Example

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages