grid-world-rl

Implementations of MDP value iteration, MDP policy iteration, and Q-Learning in a toy grid-world setting.

TODO

The policy iteration implementation is suboptimal, as it does not use the closed-form solution. Pull requests are welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
gridworld.py		gridworld.py
qlearn.py		qlearn.py
rl.py		rl.py