RLHF Wrapper for OpenAI's Gym Library
git clone <REPO>
cd RLHF-GYM
chmod u+x build.sh run.sh
./build.sh-
For the second run so on, you can just run the following commmand:
./run.sh -
The
appdirectory of the container is mounted to theappdirectory of the host machine. So, you can edit the code in the host machine and run it in the container. -
Please add any new library that you work with to the requirements.txt file.
-
appdirectory contains the source code of the project in the container. The following are in this directory:-
datasetdirectory includes the synthetic dataset that is used in the project to mimic the human preference data for the reward modeling. -
modelsdirectory includes the trained models that are used in the project. -
srccontains the source code for the RLHF specific wrappers and the training code for the RLHF.-
data/synthetic-generation.pyis the code to generate the synthetic preference dataset. -
environment.pyis the MiniGrid environment wrapper for the RLHF. -
The rest of the scripts belong to the RLHF training code.
-
-