MAMDP Experiments

Code for the paper How RL Agents Behave When Their Actions Are Modified by Eric Langlois and Tom Everitt (AAAI 2021).

Install

pip install .

This installs the mamdp package along with several scripts prefixed by mamdp-.

Running Experiments

The following commands will reproduce the results described in the paper.

Run results are stored in the experiments/ directory and are re-used if available. If changing any parameters other than NUM_RUNS, make sure that experiments/ does not contain past runs.

Train and evaluate the Simulation-Oversight environment

make -j<NUM_CPU_CORES> simulation-oversight

Training curves are saved to experiments/simulation-oversight/training.png and can be plotted manually with:

mamdp-plot-evaluations experiments/simulation-oversight/

Summarize the policies

mamdp-summarize-policies experiments/simulation-oversight/*.policies.json

Train and evaluate the Small Whisky-Gold environment

make -j<NUM_CPU_CORES> NUM_RUNS=10 whisky-gold-small

Summarize the Small Whisky-Gold strategies

0 is the state index at the branch point between heading directly to the goal through the whisky (right; action = 3) or going around (down; action = 2)

mamdp-summarize-policies experiments/whisky-gold-small/*.policies.json --argmax --state 0 --actions 2 3

Probability that the policy visits a state. 11 is the index of the whisky

mamdp-plot-policies experiments/whisky-gold-small/*.eval.json --state 9

Train and evaluate the Off-Switch environment

Uses a fixed learning rate instead of 1/visit_count.

make -j<NUM_CPU_CORES> NUM_RUNS=10 off-switch

Summarize the Off-Switch strategies

11 is the state index at the branch point between detouring to the disable button (down; action = 2) or heading directly towards the goal (left; action = 1).

mamdp-summarize-policies experiments/off-switch/*.policies.json --argmax --state 11 --actions 1 2

Probability that the policy visits a state. 36 is the index of the off switch button state.

mamdp-plot-policies experiments/off-switch/*.eval.json --state 36

Development

Editable Install

python setup.py develop [--user]

Re-run this command to refresh the version number (based on git tags).

Testing

python -m pytest

Versioning

Uses Semantic Versioning.

Versions are set exclusively via git tags:

git -a v0.1.2 -m "Version 0.1.2"

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
mamdp		mamdp
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MAMDP Experiments

Install

Running Experiments

Train and evaluate the Simulation-Oversight environment

Summarize the policies

Train and evaluate the Small Whisky-Gold environment

Summarize the Small Whisky-Gold strategies

Train and evaluate the Off-Switch environment

Summarize the Off-Switch strategies

Development

Editable Install

Testing

Versioning

About

Uh oh!

Releases

Packages

Languages

License

edlanglois/mamdp

Folders and files

Latest commit

History

Repository files navigation

MAMDP Experiments

Install

Running Experiments

Train and evaluate the Simulation-Oversight environment

Summarize the policies

Train and evaluate the Small Whisky-Gold environment

Summarize the Small Whisky-Gold strategies

Train and evaluate the Off-Switch environment

Summarize the Off-Switch strategies

Development

Editable Install

Testing

Versioning

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages