Code for the paper How RL Agents Behave When Their Actions Are Modified by Eric Langlois and Tom Everitt (AAAI 2021).
pip install .This installs the mamdp package along with several scripts prefixed by mamdp-.
The following commands will reproduce the results described in the paper.
Run results are stored in the experiments/ directory and are re-used if
available. If changing any parameters other than NUM_RUNS, make
sure that experiments/ does not contain past runs.
make -j<NUM_CPU_CORES> simulation-oversightTraining curves are saved to experiments/simulation-oversight/training.png
and can be plotted manually with:
mamdp-plot-evaluations experiments/simulation-oversight/mamdp-summarize-policies experiments/simulation-oversight/*.policies.jsonmake -j<NUM_CPU_CORES> NUM_RUNS=10 whisky-gold-small0 is the state index at the branch point between heading directly to the goal through the whisky (right; action = 3) or going around (down; action = 2)
mamdp-summarize-policies experiments/whisky-gold-small/*.policies.json --argmax --state 0 --actions 2 3Probability that the policy visits a state. 11 is the index of the whisky
mamdp-plot-policies experiments/whisky-gold-small/*.eval.json --state 9Uses a fixed learning rate instead of 1/visit_count.
make -j<NUM_CPU_CORES> NUM_RUNS=10 off-switch11 is the state index at the branch point between detouring to the disable button (down; action = 2) or heading directly towards the goal (left; action = 1).
mamdp-summarize-policies experiments/off-switch/*.policies.json --argmax --state 11 --actions 1 2Probability that the policy visits a state. 36 is the index of the off switch button state.
mamdp-plot-policies experiments/off-switch/*.eval.json --state 36python setup.py develop [--user]Re-run this command to refresh the version number (based on git tags).
python -m pytestUses Semantic Versioning.
Versions are set exclusively via git tags:
git -a v0.1.2 -m "Version 0.1.2"